This functionally empty patch removes rq_sock and unamed union
from rqstp structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch moves the transport sockaddr to the svc_xprt
structure. Convenience functions are added to set and
get the local and remote addresses of a transport from
the transport provider as well as determine the length
of a sockaddr.
A transport is responsible for setting the xpt_local
and xpt_remote addresses in the svc_xprt structure as
part of transport creation and xpo_accept processing. This
cannot be done in a generic way and in fact varies
between TCP, UDP and RDMA. A set of xpo_ functions
(e.g. getlocalname, getremotename) could have been
added but this would have resulted in additional
caching and copying of the addresses around. Note that
the xpt_local address should also be set on listening
endpoints; for TCP/RDMA this is done as part of
endpoint creation.
For connected transports like TCP and RDMA, the addresses
never change and can be set once and copied into the
rqstp structure for each request. For UDP, however, the
local and remote addresses may change for each request. In
this case, the address information is obtained from the
UDP recvmsg info and copied into the rqstp structure from
there.
A svc_xprt_local_port function was also added that returns
the local port given a transport. This is used by
svc_create_xprt when returning the port associated with
a newly created transport, and later when creating a
generic find transport service to check if a service is
already listening on a given port.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch moves the transport independent sk_deferred list to the svc_xprt
structure and updates the svc_deferred_req structure to keep pointers to
svc_xprt's directly. The deferral processing code is also moved out of the
transport dependent recvfrom functions and into the generic svc_recv path.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the authinfo cache to svc_xprt. This allows both the TCP and RDMA
transports to share this logic. A flag bit is used to determine if
auth information is to be cached or not. Previously, this code looked
at the transport protocol.
I've also changed the spin_lock/unlock logic so that a lock is not taken for
transports that are not caching auth info.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
With the implementation of the new mark and sweep algorithm for shutting
down old connections, the sk_lastrecv field is no longer needed.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
All fields touched by svc_sock_received are now transport independent.
Change it to use svc_xprt directly. This function is called from
transport dependent code, so export it.
Update the comment to clearly state the rules for calling this function.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move the sk_mutex field to the transport independent svc_xprt structure.
Now all the fields that svc_send touches are transport neutral. Change the
svc_send function to use the transport independent svc_xprt directly instead
of the transport dependent svc_sock structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally trivial patch moves the sk_reserved field to the
transport independent svc_xprt structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.
The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.
This code races with module removal and transport addition.
Thanks to Simon Holm Thøgersen for a compile fix.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
This is another incremental change that moves transport independent
fields from svc_sock to the svc_xprt structure. The changes
should be functionally null.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This functionally trivial change moves the transport independent sk_flags
field to the transport independent svc_xprt structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Change the atomic_t reference count to a kref and move it to the
transport indepenent svc_xprt structure. Change the reference count
wrapper names to be generic.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Modify the various kernel RPC svcs to use the svc_create_xprt service.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_create_xprt function is a transport independent version
of the svc_makesock function.
Since transport instance creation contains transport dependent and
independent components, add an xpo_create transport function. The
transport implementation of this function allocates the memory for the
endpoint, implements the transport dependent initialization logic, and
calls svc_xprt_init to initialize the transport independent field (svc_xprt)
in it's data structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Previously, the accept logic looked into the socket state to determine
whether to call accept or recv when data-ready was indicated on an endpoint.
Since some transports don't use sockets, this logic now uses a flag
bit (SK_LISTENER) to identify listening endpoints. A transport function
(xpo_accept) allows each transport to define its own accept processing.
A transport's initialization logic is reponsible for setting the
SK_LISTENER bit. I didn't see any way to do this in transport independent
logic since the passive side of a UDP connection doesn't listen and
always recv's.
In the svc_recv function, if the SK_LISTENER bit is set, the transport
xpo_accept function is called to handle accept processing.
Note that all functions are defined even if they don't make sense
for a given transport. For example, accept doesn't mean anything for
UDP. The function is defined anyway and bug checks if called. The
UDP transport should never set the SK_LISTENER bit.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
In order to avoid blocking a service thread, the receive side checks
to see if there is sufficient write space to reply to the request.
Each transport has a different mechanism for determining if there is
enough write space to reply.
The code that checked for write space was coupled with code that
checked for CLOSE and CONN. These checks have been broken out into
separate statements to make the code easier to read.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Some transports add fields to the RPC header for replies, e.g. the TCP
record length. This function is called when preparing the reply header
to allow each transport to add whatever fields it requires.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add transport specific xpo_detach and xpo_free functions. The xpo_detach
function causes the transport to stop delivering data-ready events
and enqueing the transport for I/O.
The xpo_free function frees all resources associated with the particular
transport instance.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_sock_release function releases pages allocated to a thread. For
UDP this frees the receive skb. For RDMA it will post a receive WR
and bump the client credit count.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The sk_sendto and sk_recvfrom are function pointers that allow svc_sock
to be used for both UDP and TCP. Move these function pointers to the
svc_xprt_ops structure.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The svc_max_payload function currently looks at the socket type
to determine the max payload. Add a max payload value to svc_xprt_class
so it can be returned directly.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The rqstp structure contains a pointer to the transport for the
RPC request. This functionaly trivial patch adds an unamed union
with pointers to both svc_sock and svc_xprt. Ultimately the
union will be removed and only the rq_xprt field will remain. This
allows incrementally extracting transport independent interfaces without
one gigundo patch.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Make TCP and UDP svc_sock transports, and register them
with the svc transport core.
A transport type (svc_sock) has an svc_xprt as its first member,
and calls svc_xprt_init to initialize this field.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The transport class (svc_xprt_class) represents a type of transport, e.g.
udp, tcp, rdma. A transport class has a unique name and a set of transport
operations kept in the svc_xprt_ops structure.
A transport class can be dynamically registered and unregisterd. The
svc_xprt_class represents the module that implements the transport
type and keeps reference counts on the module to avoid unloading while
there are active users.
The endpoint (svc_xprt) is a generic, transport independent endpoint that can
be used to send and receive data for an RPC service. It inherits it's
operations from the transport class.
A transport driver module registers and unregisters itself with svc sunrpc
by calling svc_reg_xprt_class, and svc_unreg_xprt_class respectively.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch addresses a compatibility issue with a Linux NFS server and
AIX NFS client.
I have exported /export as fsid=0 with sec=krb5:krb5i
I have mount --bind /home onto /export/home
I have exported /export/home with sec=krb5i
The AIX client mounts / -o sec=krb5:krb5i onto /mnt
If I do an ls /mnt, the AIX client gets a permission error. Looking at
the network traceIwe see a READDIR looking for attributes
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a
NFS4ERR_WRONGSEC which the AIX client is not expecting.
Since the AIX client is only asking for an attribute that is an
attribute of the parent file system (pseudo root in my example), it
seems reasonable that there should not be an error.
In discussing this issue with Bruce Fields, I initially proposed
ignoring the error in nfsd4_encode_dirent_fattr() if all that was being
asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however,
Bruce suggested that we avoid calling cross_mnt() if only these
attributes are requested.
The following patch implements bypassing cross_mnt() if only
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there
is some complexity in the code in nfsd4_encode_fattr(), I didn't want to
duplicate code (and introduce a maintenance nightmare), so I added a
parameter to nfsd4_encode_fattr() that indicates whether it should
ignore cross mounts and simply fill in the attribute using the passed in
dentry as opposed to it's parent.
Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This header is used only in a few places in fs/nfsd, so there seems to
be little point to having it in include/. (Thanks to Robert Day for
pointing this out.)
Cc: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Newer server features such as nfsv4 and gss depend on proc to work, so a
failure to initialize the proc files they need should be treated as
fatal.
Thanks to Andrew Morton for style fix and compile fix in case where
CONFIG_NFSD_V4 is undefined.
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There's really nothing much the caller can do if cache unregistration
fails. And indeed, all any caller does in this case is print an error
and continue. So just return void and move the printk's inside
cache_unregister.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If the reply cache initialization fails due to a kmalloc failure,
currently we try to soldier on with a reduced (or nonexistant) reply
cache.
Better to just fail immediately: the failure is then much easier to
understand and debug, and it could save us complexity in some later
code. (But actually, it doesn't help currently because the cache is
also turned off in some odd failure cases; we should probably find a
better way to handle those failure cases some day.)
Fix some minor style problems while we're at it, and rename
nfsd_cache_init() to remove the need for a comment describing it.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: For consistency, store the length of path name strings in nfsd
argument structures as unsigned integers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: adjust the sign of the length argument of nfsd_lookup and
nfsd_lookup_dentry, for consistency with recent changes. NFSD version
4 callers already pass an unsigned file name length.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: For consistency, store the length of file name strings in nfsd
argument structures as unsigned integers. This matches the XDR routines
and client argument structures for the same operation types.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
According to The Open Group's NLM specification, NLM callers are variable
length strings. XDR variable length strings use an unsigned 32 bit length.
And internally, negative string lengths are not meaningful for the Linux
NLM implementation.
Clean up: Make nlm_lock.len and nlm_reboot.len unsigned integers. This
makes the sign of NLM string lengths consistent with the sign of xdr_netobj
lengths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
XDR strings, opaques, and net objects should all use unsigned lengths.
To wit, RFC 4506 says:
4.2. Unsigned Integer
An XDR unsigned integer is a 32-bit datum that encodes a non-negative
integer in the range [0,4294967295].
...
4.11. String
The standard defines a string of n (numbered 0 through n-1) ASCII
bytes to be the number n encoded as an unsigned integer (as described
above), and followed by the n bytes of the string.
After this patch, xdr_decode_string_inplace now matches the other XDR
string and array helpers that take a string length argument. See:
xdr_encode_opaque_fixed, xdr_encode_opaque, xdr_encode_array
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'audit.b46' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
[AUDIT] Add uid, gid fields to ANOM_PROMISCUOUS message
[AUDIT] ratelimit printk messages audit
[patch 2/2] audit: complement va_copy with va_end()
[patch 1/2] kernel/audit.c: warning fix
[AUDIT] create context if auditing was ever enabled
[AUDIT] clean up audit_receive_msg()
[AUDIT] make audit=0 really stop audit messages
[AUDIT] break large execve argument logging into smaller messages
[AUDIT] include audit type in audit message when using printk
[AUDIT] do not panic on exclude messages in audit_log_pid_context()
[AUDIT] Add End of Event record
[AUDIT] add session id to audit messages
[AUDIT] collect uid, loginuid, and comm in OBJ_PID records
[AUDIT] return EINTR not ERESTART*
[PATCH] get rid of loginuid races
[PATCH] switch audit_get_loginuid() to task_struct *
This patch setups correctly MIMO PS mode flags
Signed-off-by: Guy Cohen <guy.cohen@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
execve arguments can be quite large. There is no limit on the number of
arguments and a 4G limit on the size of an argument.
this patch prints those aruguments in bite sized pieces. a userspace size
limitation of 8k was discovered so this keeps messages around 7.5k
single arguments larger than 7.5k in length are split into multiple records
and can be identified as aX[Y]=
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch adds an end of event record type. It will be sent by the kernel as
the last record when a multi-record event is triggered. This will aid realtime
analysis programs since they will now reliably know they have the last record
to complete an event. The audit daemon filters this and will not write it to
disk.
Signed-off-by: Steve Grubb <sgrubb redhat com>
Signed-off-by: Eric Paris <eparis@redhat.com>
In order to correlate audit records to an individual login add a session
id. This is incremented every time a user logs in and is included in
almost all messages which currently output the auid. The field is
labeled ses= or oses=
Signed-off-by: Eric Paris <eparis@redhat.com>
Keeping loginuid in audit_context is racy and results in messier
code. Taken to task_struct, out of the way of ->audit_context
changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Avoid section mismatch involving arch_register_cpu.
Marking arch_register_cpu as __init and removing the export
for non-hotplug-cpu configurations makes the following warning
go away:
Section mismatch in reference from the function
arch_register_cpu() to the function .devinit.text:register_cpu()
The function arch_register_cpu() references
the function __devinit register_cpu().
This is often because arch_register_cpu lacks a __devinit
annotation or the annotation of register_cpu is wrong.
The only external user of arch_register_cpu in the tree is
in drivers/acpi/processor_core.c where it is guarded by
ACPI_HOTPLUG_CPU (which depends on HOTPLUG_CPU).
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To allow the implementation of optimized rw-locks in user space, glibc
needs a possibility to select waiters for wakeup depending on a bitset
mask.
This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
expecting an absolute timeout value instead of the relative one, which
is used for the FUTEX_WAIT OP.
FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
stored in the futex_q structure, which is used to enqueue the waiter
into the hashed futex waitqueue.
FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
function logically ANDs the bitset with the bitset stored in each
waiters futex_q structure. If the result is zero (i.e. none of the set
bits in the bitsets is matching), then the waiter is not woken up. If
the result is not zero (i.e. one of the set bits in the bitsets is
matching), then the waiter is woken.
The bitset provided by the caller must be non zero. In case the
provided bitset is zero the kernel returns EINVAL.
Internaly the new OPs are only extensions to the existing FUTEX_WAIT
and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
set into the futex_wait() and futex_wake() functions.
Signed-off-by: Thomas Gleixner <tgxl@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The exception fixup for the futex macros __futex_atomic_op1/2 and
futex_atomic_cmpxchg_inatomic() is missing an entry when the lock
prefix is replaced by a NOP via SMP alternatives.
Chuck Ebert tracked this down from the information provided in:
https://bugzilla.redhat.com/show_bug.cgi?id=429412
A possible solution would be to add another fixup after the
LOCK_PREFIX, so both the LOCK and NOP case have their own entry in the
exception table, but it's not really worth the trouble.
Simply replace LOCK_PREFIX with lock and keep those untouched by SMP
alternatives.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To allow better diagnosis of tick-sched related, especially NOHZ
related problems, we need to know when the last wakeup via an irq
happened and when the CPU left the idle state.
Add two fields (idle_waketime, idle_exittime) to the tick_sched
structure and add them to the timer_list output.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xtime_cache needs to be updated whenever xtime and or wall_to_monotic
are changed. Otherwise users of xtime_cache might see a stale (and in
the case of timezone changes utterly wrong) value until the next
update happens.
Fixup the obvious places, which miss this update.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: kill swap_io_context()
as-iosched: fix inconsistent ioc->lock context
ide-cd: fix leftover data BUG
block: make elevator lib checkpatch compliant
cfq-iosched: make checkpatch compliant
block: make core bits checkpatch compliant
block: new end request handling interface should take unsigned byte counts
unexport add_disk_randomness
block/sunvdc.c:print_version() must be __devinit
splice: always updated atime in direct splice
It blindly copies everything in the io_context, including the lock.
That doesn't work so well for either lock ordering or lockdep.
There seems zero point in swapping io contexts on a request to request
merge, so the best point of action is to just remove it.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Fix problems with the 528x ColdFire CPU cache setup.
Do not cache the flash region (if present), and make the runtime
settings consistent with the init setting.
Problems pointed out by Bernd Buttner <b.buettner@mkc-gmbh.de>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The namespace is not available in the fib_sync_down_addr, add it as a
parameter.
Looking up a device by the pointer to it is OK. Looking up using a
result from fib_trie/fib_hash table lookup is also safe. No need to
fix that at all. So, just fix lookup by address and insertion to the
hash table path.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is required to make fib_info lookups namespace aware. In the
other case initial namespace devices are marked as dead in the local
routing table during other namespace stop.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_sync_down can be called with an address and with a device. In
reality it is called either with address OR with a device. The
codepath inside is completely different, so lets separate it into two
calls for these two cases.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new "flow" classifier, which is meant to extend the SFQ hashing
capabilities without hard-coding new hash functions and also allows
deterministic mappings of keys to classes, replacing some out of tree
iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps
IPs to marks, with fw filters to classes), ...
Some examples:
- Classic SFQ hash:
tc filter add ... flow hash \
keys src,dst,proto,proto-src,proto-dst divisor 1024
- Classic SFQ hash, but using information from conntrack to work properly in
combination with NAT:
tc filter add ... flow hash \
keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024
- Map destination IPs of 192.168.0.0/24 to classids 1-257:
tc filter add ... flow map \
key dst addend -192.168.0.0 divisor 256
- alternatively:
tc filter add ... flow map \
key dst and 0xff
- similar, but reverse ordered:
tc filter add ... flow map \
key dst and 0xff xor 0xff
Perturbation is currently not supported because we can't reliable kill the
timer on destruction.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for dumping statistics and make internal queues visible as
classes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ipv4_devconf can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Outbound sequence number overflow error status
is counted as XfrmOutStateSeqError.
o Additionaly, it changes inbound sequence number replay
error name from XfrmInSeqOutOfWindow to XfrmInStateSeqError
to apply name scheme above.
o Inbound IPv4 UDP encapsuling type mismatch error is wrongly
mapped to XfrmInStateInvalid then this patch fiex the error
to XfrmInStateMismatch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current ip route cache implementation is not suited to large caches.
We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.
A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.
Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.
This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
Reuse the existing logic for multicast list synchronization for the
unicast address list. The core of dev_mc_sync/unsync are split out as
__dev_addr_sync/unsync and moved from dev_mcast.c to dev.c. These are
then used to implement dev_unicast_sync/unsync as well.
I'm working on cleaning up Intel's FCoE stack, which generates new MAC
addresses from the fibre channel device id assigned by the fabric as
per the current draft specification in T11. When using such a
protocol in a VLAN environment it would be nice to not always be
forced into promiscuous mode, assuming the underlying Ethernet driver
supports multiple unicast addresses as well.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Normally during a dump the key of the last dumped entry is used for
continuation, but since lock is dropped it might be lost. In that case
fallback to the old counter based N^2 behaviour. This means the dump
will end up skipping some routes which matches what FIB_HASH does.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a net argument to inet6_lookup and propagate it further.
Actually, this is tcp-v6 implementation of what was done for
tcp-v4 sockets in a previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a net argument to inet_lookup and propagate it further
into lookup calls. Plus tune the __inet_check_established.
The dccp and inet_diag, which use that lookup functions
pass the init_net into them.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This tags the inet_bind_bucket struct with net pointer,
initializes it during creation and makes a filtering
during lookup.
A better hashfn, that takes the net into account is to
be done in the future, but currently all bind buckets
with similar port will be in one hash chain.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These two functions are the same except for what they call
to "check_established" and "hash" for a socket.
This saves half-a-kilo for ipv4 and ipv6.
add/remove: 1/0 grow/shrink: 1/4 up/down: 582/-1128 (-546)
function old new delta
__inet_hash_connect - 577 +577
arp_ignore 108 113 +5
static.hint 8 4 -4
rt_worker_func 376 372 -4
inet6_hash_connect 584 25 -559
inet_hash_connect 586 25 -561
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have INET_MATCH, INET_TW_MATCH and INET6_MATCH to test sockets and
twbuckets for matching, but ipv6 twbuckets are tested manually.
Here's the INET6_TW_MATCH to help with it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Constify a few data tables use const qualifiers on variables where
possible in the nf_*_proto_tcp sources.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces the xt_hashlimit match revision 1. It adds support for
kernel-level inversion and grouping source and/or destination IP
addresses, allowing to limit on a per-subnet basis. While this would
technically obsolete xt_limit, xt_hashlimit is a more expensive due
to the hashbucketing.
Kernel-level inversion: Previously you had to do user-level inversion:
iptables -N foo
iptables -A foo -m hashlimit --hashlimit(-upto) 5/s -j RETURN
iptables -A foo -j DROP
iptables -A INPUT -j foo
now it is simpler:
iptables -A INPUT -m hashlimit --hashlimit-over 5/s -j DROP
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename all "conntrack" variables to "ct" for more consistency and
avoiding some overly long lines.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reorder struct nf_conntrack_l4proto so all members used during packet
processing are in the same cacheline.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_ct_tuple_src_equal() and nf_ct_tuple_dst_equal() both compare the protocol
numbers. Unfortunately gcc doesn't optimize out the second comparison, so
remove it and prefix both functions with __ to indicate that they should not
be used directly.
Saves another 16 byte of text in __nf_conntrack_find() on x86_64:
nf_conntrack_tuple_taken | -20 # 320 -> 300, size inlines: 181 -> 161
__nf_conntrack_find | -16 # 267 -> 251, size inlines: 127 -> 115
__nf_conntrack_confirm | -40 # 875 -> 835, size inlines: 570 -> 537
3 functions changed, 76 bytes removed
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ignoring specific entries in __nf_conntrack_find() is only needed by NAT
for nf_conntrack_tuple_taken(). Remove it from __nf_conntrack_find()
and make nf_conntrack_tuple_taken() search the hash itself.
Saves 54 bytes of text in the hotpath on x86_64:
__nf_conntrack_find | -54 # 321 -> 267, # inlines: 3 -> 2, size inlines: 181 -> 127
nf_conntrack_tuple_taken | +305 # 15 -> 320, lexblocks: 0 -> 3, # inlines: 0 -> 3, size inlines: 0 -> 181
nf_conntrack_find_get | -2 # 90 -> 88
3 functions changed, 305 bytes added, 56 bytes removed, diff: +249
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the RCU conversion only write_lock usages of nf_conntrack_lock are
left (except one read_lock that should actually use write_lock in the
H.323 helper). Switch to a spinlock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU for expectation hash. This doesn't buy much for conntrack
runtime performance, but allows to reduce the use of nf_conntrack_lock
for /proc and nf_netlink_conntrack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
CHECK net/ipv4/netfilter/ip_tables.c
net/ipv4/netfilter/ip_tables.c:1453:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1453:8: expected int *size
net/ipv4/netfilter/ip_tables.c:1453:8: got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1458:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1458:44: expected int *size
net/ipv4/netfilter/ip_tables.c:1458:44: got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1603:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1603:2: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1603:2: got int *<noident>
net/ipv4/netfilter/ip_tables.c:1627:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1627:8: expected int *size
net/ipv4/netfilter/ip_tables.c:1627:8: got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1634:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1634:40: expected int *size
net/ipv4/netfilter/ip_tables.c:1634:40: got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1653:8: warning: incorrect type in argument 5 (different signedness)
net/ipv4/netfilter/ip_tables.c:1653:8: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1653:8: got int *<noident>
net/ipv4/netfilter/ip_tables.c:1666:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1666:2: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1666:2: got int *<noident>
CHECK net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/arp_tables.c:1285:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1285:40: expected int *size
net/ipv4/netfilter/arp_tables.c:1285:40: got unsigned int *size
net/ipv4/netfilter/arp_tables.c:1543:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1543:44: expected int *size
net/ipv4/netfilter/arp_tables.c:1543:44: got unsigned int [usertype] *size
CHECK net/ipv6/netfilter/ip6_tables.c
net/ipv6/netfilter/ip6_tables.c:1481:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1481:8: expected int *size
net/ipv6/netfilter/ip6_tables.c:1481:8: got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1486:44: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1486:44: expected int *size
net/ipv6/netfilter/ip6_tables.c:1486:44: got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1631:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1631:2: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1631:2: got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1655:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1655:8: expected int *size
net/ipv6/netfilter/ip6_tables.c:1655:8: got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1662:40: expected int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1680:8: warning: incorrect type in argument 5 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1680:8: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1680:8: got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1693:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1693:2: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1693:2: got int *<noident>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hashtable size is really unsigned so sparse complains when you pass
a signed integer. Change all uses to make it consistent.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for ranges to the new revision. This doesn't affect
compatibility since the new revision was not released yet.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Propagate netns from userspace.
* arpt_register_table() registers table in supplied netns.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now it's possible to list and manipulate per-netns ip6tables rules.
Filtering decisions are based on init_net's table so far.
P.S.: remove init_net check in inet6_create() to see the effect
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Propagate netns from userspace down to xt_find_table_lock()
* Register ip6 tables in netns (modules still use init_net)
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now, iptables show and configure different set of rules in different
netnss'. Filtering decisions are still made by consulting only
init_net's set.
Changes are identical except naming so no splitting.
P.S.: one need to remove init_net checks in nf_sockopt.c and inet_create()
to see the effect.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Typical table module registers xt_table structure (i.e. packet_filter)
and link it to list during it. We can't use one template for it because
corresponding list_head will become corrupted. We also can't unregister
with template because it wasn't changed at all and thus doesn't know in
which list it is.
So, we duplicate template at the very first step of table registration.
Table modules will save it for use during unregistration time and actual
filtering.
Do it at once to not screw bisection.
P.S.: renaming i.e. packet_filter => __packet_filter is temporary until
full netnsization of table modules is done.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.
Every user stubbed with init_net for a while.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch from 0/-E to ptr/PTR_ERR convention.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the xt_conntrack match revision 1 by port matching (all four
{orig,repl}{src,dst}) and by packet direction matching.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before the removal of the deferred output hooks, netoutdev was used in
case of VLANs on top of a bridge to store the VLAN device, so the
deferred hooks would see the correct output device. This isn't
necessary anymore since we're calling the output hooks for the correct
device directly in the IP stack.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The address of IPv6 raw sockets was shown in the wrong format, from
IPv4 ones. The problem has been introduced by the commit
42a73808ed ("[RAW]: Consolidate proc
interface.")
Thanks to Adrian Bunk who originally noticed the problem.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different hashtables are used for IPv6 and IPv4 raw sockets, so no
need to check the socket family in the iterator over hashtables. Clean
this out.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.
It requires CAP_NET_ADMIN capability.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for combined mode algorithms with GCM being
the first algorithm supported.
Combined mode algorithms can be added through the xfrm_user interface
using the new algorithm payload type XFRMA_ALG_AEAD. Each algorithms
is identified by its name and the ICV length.
For the purposes of matching algorithms in xfrm_tmpl structures,
combined mode algorithms occupy the same name space as encryption
algorithms. This is in line with how they are negotiated using IKE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts ESP to use the crypto_aead interface and in particular
the authenc algorithm. This lays the foundations for future support of
combined mode algorithms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move headers usbnet.h and rndis_host.h to include/linux/usb and fix includes
for drivers/net/usb modules. Headers are moved because rndis_wlan will be
outside drivers/net/usb in drivers/net/wireless and yet need these headers.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Teach rfkill about wimax radios.
Had to define a KEY_WIMAX as a 'key for disabling only wimax radios',
as other radio technologies have. This makes sense as hardware has
specific keys for disabling specific radios.
The RFKILL enabling part is, otherwise, a copy and paste of any other
radio technology.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
Remove commented-out code copied from NFS
NFS: Switch from intr mount option to TASK_KILLABLE
Add wait_for_completion_killable
Add wait_event_killable
Add schedule_timeout_killable
Use mutex_lock_killable in vfs_readdir
Add mutex_lock_killable
Use lock_page_killable
Add lock_page_killable
Add fatal_signal_pending
Add TASK_WAKEKILL
exit: Use task_is_*
signal: Use task_is_*
sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
ptrace: Use task_is_*
power: Use task_is_*
wait: Use TASK_NORMAL
proc/base.c: Use task_is_*
proc/array.c: Use TASK_REPORT
perfmon: Use task_is_*
...
Fixed up conflicts in NFS/sunrpc manually..
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/perex/alsa: (299 commits)
[ALSA] version 1.0.16rc2
[ALSA] hda: fix Mic in as output
[ALSA] emu10k1 - Another EMU0404 Board ID
[ALSA] emu10k1 - Fix kthread handling at resume
[ALSA] emu10k1: General cleanup, add new locks, fix alsa bug#3501, kernel bug#9304.
[ALSA] emu10k1 - Use enum for emu_model types
[ALSA] emu10k1 - Don't create emu1010 controls for non-emu boards
[ALSA] emu10k1 - 1616(M) cardbus improvements
[ALSA] snd:emu10k1: E-Mu updates. Fixes to firmware loading and support for 0404.
[ALSA] emu10k1: Add comments regarding E-Mu ins and outs.
[ALSA] oxygen: revert SPI clock frequency change for AK4396/WM8785
[ALSA] es1938 - improve capture hw pointer reads
[ALSA] HDA-Intel - Add support for Intel SCH
[ALSA] hda: Add GPIO mute support to STAC9205
[ALSA] hda-codec - Add Dell T3400 support
[ALSA] hda-codec - Add model for HP DV9553EG laptop
[ALSA] hda-codec - Control SPDIF as slave
[ALSA] hda_intel: ALSA HD Audio patch for Intel ICH10 DeviceID's
[ALSA] Fix Oops with PCM OSS sync
[ALSA] hda-codec - Add speaker automute to ALC262 HP models
...
bring back the avr32, blackfin, sh, sparc architectures into working order,
by reverting the effects of this change that came in via the x86 tree:
commit a5a19c63f4
Author: Jeremy Fitzhardinge <jeremy@goop.org>
Date: Wed Jan 30 13:33:39 2008 +0100
x86: demacro asm-x86/pgalloc_32.h
Sorry about that!
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch improves E-Mu 1616(M) cardbus support. It adds definitions of the
new Microdock and 1010 cardbus registers (thanks again for descriptions
James) and improves mixer for this card. Now you can use S/PDIF and ADAT on
Mirodock and also use headpohone output on host cardbus card as another
independent output.
Signed-off-by: Ctirad Fertr <c.fertr@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This is improvement of the early support of the FM-only cards where the
fm801 chip represents the PCI to tuner bridge.
The tuner initialization isn't included the mute on as well as mute support
via V4L request. Proposed patch should fix this at least for 64-PCR model.
Signed-off-by: Andy Shevchenko <andy@smile.org.ua>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Change semantics for SNDRV_PCM_TSTAMP_MMAP. Doing timestamping only in
the interrupt handler might cause that hw_ptr is not related to actual
timestamp. With this change, grab timestamp at every hw_ptr update to
have always valid timestamp + ring buffer position pair.
With this change, SNDRV_PCM_TSTAMP_MMAP was renamed to
SNDRV_PCM_TSTAMP_ENABLE. It's no regression (I think).
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This fixes a bug whereby PCMs were not being suspended when the rest of the
audio subsystem was suspended.
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Added a device level dapm event so that both the machine and codec are informed
when dapm events occur.
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This header file exists only for some hacks to adapt alsa-driver
tree. It's useless for building in the kernel. Let's move a few
lines in it to sound/core.h and remove it.
With this patch, sound/driver.h isn't removed but has just a single
compile warning to include it. This should be really killed in
future.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
The 'tick' in PCM is set (again) via sw_params. And, nobody uses
this feature at all except for a command line option of aplay.
(This is literally 'nobody', as I checked alsa-lib API calls in all
programs in major distros.)
Above all, if we need finer wake-ups for the position update, it's
basically an issue that the driver should solve, not tuned by each
application.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
The xfer_align sw_params parameter has never been used in a sane manner,
and no one understands what this does exactly. The current
implementation looks also buggy because it allows write of shorter size
than xfer_align. So, if you do partial writes, the write isn't actually
aligned at all.
Removing this parameter will make some pcm_lib_* code more readable
(and less buggy).
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
This patch removes the indirect control access to the control elements.
The indirect access has never been used and is even broken on 32bit
ioctl wrapper. Let's clean it up.
The pointers still remain in snd_ctl_elem_* structs just to make sure
that the struct size won't change. Once after checking the size
consistency, we can get rid of them, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
info_oss: move prototype of snd_card_info_read_oss to info.h
Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
We need an accurate and continuous (monotonic) time sources to do
accurate synchronization among more timing sources. This patch allows
to enable monotonic timestamps for ALSA PCM devices and enables monotonic
timestamps for ALSA timer devices.
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
* support for switching rate in STAC9460 - using set_rate_val of the akm
infrastructure
* listing all STAC9460 registers in proc
* disabling mpu401 device for Prodigy192 - otherwise the currently
flawed mpu401 code hangs kernel when opening the midi device
* removing old unused commented-out code
Signed-off-by: Pavel Hofman <dustin@seznam.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
remove dead config symbols from sound code
Signed-off-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Updated the forgotten SNDRV_HWDEP_IFACE_LAST to point the really last member.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Remove sequencer instrument layer from the tree.
This mechanism hasn't been used much with the actual devices. The only
reasonable user was OPL3 loader, and now it was rewritten to use hwdep
instead. So, let's remove the rest of rotten codes.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Use the exclusive access lock in hwdep instead of the own one.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Use the hwdep device for loading OPL2/3 patch data instead of the
messy sequencer instrument layer.
Due to this change, the sbiload program should be updated, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (454 commits)
[POWERPC] Cell IOMMU fixed mapping support
[POWERPC] Split out the ioid fetching/checking logic
[POWERPC] Add support to cell_iommu_setup_page_tables() for multiple windows
[POWERPC] Split out the IOMMU logic from cell_dma_dev_setup()
[POWERPC] Split cell_iommu_setup_hardware() into two parts
[POWERPC] Split out the logic that allocates struct iommus
[POWERPC] Allocate the hash table under 1G on cell
[POWERPC] Add set_dma_ops() to match get_dma_ops()
[POWERPC] 83xx: Clean up / convert mpc83xx board DTS files to v1 format.
[POWERPC] 85xx: Only invalidate TLB0 and TLB1
[POWERPC] 83xx: Fix typo in mpc837x compatible entries
[POWERPC] 85xx: convert sbc85* boards to use machine_device_initcall
[POWERPC] 83xx: rework platform Kconfig
[POWERPC] 85xx: rework platform Kconfig
[POWERPC] 86xx: Remove unused IRQ defines
[POWERPC] QE: Explicitly set address-cells and size cells for muram
[POWERPC] Convert StorCenter DTS file to /dts-v1/ format.
[POWERPC] 86xx: Convert all 86xx DTS files to /dts-v1/ format.
[PPC] Remove 85xx from arch/ppc
[PPC] Remove 83xx from arch/ppc
...
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
alpha: fix x86.git merge build error
ia64: on UP percpu variables are not small memory model
x86: fix arch/x86/kernel/test_nx.c modular build bug
s390: use generic percpu linux-2.6.git
POWERPC: use generic per cpu
ia64: use generic percpu
SPARC64: use generic percpu
percpu: change Kconfig to HAVE_SETUP_PER_CPU_AREA
modules: fold percpu_modcopy into module.c
x86: export copy_from_user_ll_nocache[_nozero]
x86: fix duplicated TIF on 64-bit
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
lguest: use __PAGE_KERNEL instead of _PAGE_KERNEL
lguest: Use explicit includes rateher than indirect
lguest: get rid of lg variable assignments
lguest: change gpte_addr header
lguest: move changed bitmap to lg_cpu
lguest: move last_pages to lg_cpu
lguest: change last_guest to last_cpu
lguest: change spte_addr header
lguest: per-vcpu lguest pgdir management
lguest: make pending notifications per-vcpu
lguest: makes special fields be per-vcpu
lguest: per-vcpu lguest task management
lguest: replace lguest_arch with lg_cpu_arch.
lguest: make registers per-vcpu
lguest: make emulate_insn receive a vcpu struct.
lguest: map_switcher_in_guest() per-vcpu
lguest: per-vcpu interrupt processing.
lguest: per-vcpu lguest timers
lguest: make hypercalls use the vcpu struct
lguest: make write() operation smp aware
...
Manual conflict resolved (maybe even correctly, who knows) in
drivers/lguest/x86/core.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
security: compile capabilities by default
selinux: make selinux_set_mnt_opts() static
SELinux: Add warning messages on network denial due to error
SELinux: Add network ingress and egress control permission checks
NetLabel: Add auditing to the static labeling mechanism
NetLabel: Introduce static network labels for unlabeled connections
SELinux: Allow NetLabel to directly cache SIDs
SELinux: Enable dynamic enable/disable of the network access checks
SELinux: Better integration between peer labeling subsystems
SELinux: Add a new peer class and permissions to the Flask definitions
SELinux: Add a capabilities bitmap to SELinux policy version 22
SELinux: Add a network node caching mechanism similar to the sel_netif_*() functions
SELinux: Only store the network interface's ifindex
SELinux: Convert the netif code to use ifindex values
NetLabel: Add IP address family information to the netlbl_skbuff_getattr() function
NetLabel: Add secid token support to the NetLabel secattr struct
NetLabel: Consolidate the LSM domain mapping/hashing locks
NetLabel: Cleanup the LSM domain hash functions
NetLabel: Remove unneeded RCU read locks
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
PPC: Fix powerpc vio_find_name to not use devices_subsys
Driver core: add bus_find_device_by_name function
Module: check to see if we have a built in module with the same name
x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
Driver core: Fix up build when CONFIG_BLOCK=N
a5a19c63f4 removed the include of
asm/pgalloc.h from asm-generic/tlb.h. That works fine on most
architectures, but broke ALPHA.
Fixup ALPHA by adding the include to asm-alpha/tlbflush.h
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tony says:
| The CONFIG_SMP=n path in ia64 makes quite radical changes ... rather
| than putting all the per-cpu stuff into the top 64K of address space
| and providing a per-cpu TLB mapping for that range to a different
| physical address ... it just makes all the per-cpu stuff link as ordinary
| variables in .data.
the new generic percpu code got confused about this as PER_CPU_ATTRIBUTES
was defined even on UP, so it picked up that small memory model - which
was not possible to get linked. The right fix is to only define that
on SMP. This resolved the build failures in my cross-compiling environment.
also link these variables into the .percpu section even on UP - some
assembly code has offset dependencies. (such as GET_IA64_MCA_DATA() in
arch/ia64/kernel/mca_asm.S)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Tony Luck <tony.luck@intel.com>
Change s390 percpu.h to use asm-generic/percpu.h
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Powerpc has a way to determine the address of the per cpu area of the
currently executing processor via the paca and the array of per cpu
offsets is avoided by looking up the per cpu area from the remote
paca's (copying x86_64).
Cc: Paul Mackerras <paulus@samba.org>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Olof Johansson <olof@lixom.net>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>
ia64 has a special processor specific mapping that can be used to locate the
offset for the current per cpu area.
Cc: linux-ia64@vger.kernel.org
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sparc64 has a way of providing the base address for the per cpu area of the
currently executing processor in a global register.
Sparc64 also provides a way to calculate the address of a per cpu area
from a base address instead of performing an array lookup.
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
percpu_modcopy() is defined multiple times in arch files. However, the only
user is module.c. Put a static definition into module.c and remove
the definitions from the arch files.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit 7e9916040b
and commit eee3af4a2c
Both use the same TIF number (25) in thread_info_64.h.
This patch changes the TIF ids.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.
Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
At the block level bidi request uses req->next_rq pointer for a second
bidi_read request.
At Scsi-midlayer a second scsi_data_buffer structure is used for the
bidi_read part. This bidi scsi_data_buffer is put on
request->next_rq->special. Struct scsi_cmnd is not changed.
- Define scsi_bidi_cmnd() to return true if it is a bidi request and a
second sgtable was allocated.
- Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer
from this command This API is to isolate users from the mechanics of
bidi.
- Define scsi_end_bidi_request() to do what scsi_end_request() does but
for a bidi request. This is necessary because bidi commands are a bit
tricky here. (See comments in body)
- scsi_release_buffers() will also release the bidi_read scsi_data_buffer
- scsi_io_completion() on bidi commands will now call
scsi_end_bidi_request() and return.
- The previous work done in scsi_init_io() is now done in a new
scsi_init_sgtable() (which is 99% identical to old scsi_init_io())
The new scsi_init_io() will call the above twice if needed also for
the bidi_read command. Only at this point is a command bidi.
- In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not
confused by a get-sense command that looks like bidi. This is done
by puting NULL at request->next_rq, and restoring.
[jejb: update to sg_table and resolve conflicts
also update to blk-end-request and resolve conflicts]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In preparation for bidi we abstract all IO members of scsi_cmnd,
that will need to duplicate, into a substructure.
- Group all IO members of scsi_cmnd into a scsi_data_buffer
structure.
- Adjust accessors to new members.
- scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of
scsi_cmnd. And work on it.
- Adjust scsi_init_io() and scsi_release_buffers() for above
change.
- Fix other parts of scsi_lib/scsi.c to members migration. Use
accessors where appropriate.
- fix Documentation about scsi_cmnd in scsi_host.h
- scsi_error.c
* Changed needed members of struct scsi_eh_save.
* Careful considerations in scsi_eh_prep/restore_cmnd.
- sd.c and sr.c
* sd and sr would adjust IO size to align on device's block
size so code needs to change once we move to scsi_data_buff
implementation.
* Convert code to use scsi_for_each_sg
* Use data accessors where appropriate.
- tgt: convert libsrp to use scsi_data_buffer
- isd200: This driver still bangs on scsi_cmnd IO members,
so need changing
[jejb: rebased on top of sg_table patches fixed up conflicts
and used the synergy to eliminate use_sg and sg_count]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If we export scsi_init_io()/scsi_release_buffers() instead of
scsi_{alloc,free}_sgtable() from scsi_lib than tgt code is much more
insulated from scsi_lib changes. As a bonus it will also gain bidi
capability when it comes.
[jejb: rebase on to sg_table and fix up rejections]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port. Move migration to the regular
vcpu execution path, triggered by a new bitflag.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm_para.h potentially contains definitions that are to be used by userspace,
so it should not be included inside the __KERNEL__ block. To protect its own
data structures, kvm_para.h already includes its own __KERNEL__ block.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Acked-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves kvm_fpu asm-x86/kvm.h to allow every architecture to
define an own representation used for KVM_GET_FPU/KVM_SET_FPU.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
When executing a test program called "crashme", we found the KVM guest cannot
survive more than ten seconds, then encounterd kernel panic. The basic concept
of "crashme" is generating random assembly code and trying to execute it.
After some fixes on emulator insn validity judgment, we found it's hard to
get the current emulator handle the invalid instructions correctly, for the
#UD trap for hypercall patching caused troubles. The problem is, if the opcode
itself was OK, but combination of opcode and modrm_reg was invalid, and one
operand of the opcode was memory (SrcMem or DstMem), the emulator will fetch
the memory operand first rather than checking the validity, and may encounter
an error there. For example, ".byte 0xfe, 0x34, 0xcd" has this problem.
In the patch, we simply check that if the invalid opcode wasn't vmcall/vmmcall,
then return from emulate_instruction() and inject a #UD to guest. With the
patch, the guest had been running for more than 12 hours.
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.
Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Since gfn_to_page() is a sleeping function, and we want to make the core mmu
spinlocked, we need to pass the page from the walker context (which can sleep)
to the shadow context (which cannot).
[marcelo: avoid recursive locking of mmap_sem]
Signed-off-by: Avi Kivity <avi@qumranet.com>
In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel. This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.
Signed-off-by: Avi Kivity <avi@qumranet.com>
IA64 also needs to see ioapic structure in irqchip.
Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This paves the way for multiple architecture support. Note that while
ioapic.c could potentially be shared with ia64, it is also moved.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
includes, not existing. Rather than add a zillion <asm/kvm.h>s, export kvm.h
only if the arch actually supports it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch fixes a small issue where sturctures:
kvm_pic_state
kvm_ioapic_state
are defined inside x86 specific code and may or may not
be defined in anyway for other architectures. The problem
caused is one cannot compile userspace apps (ex. libkvm)
for other archs since a size cannot be determined for these
structures.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The current cpuid management suffers from several problems, which inhibit
passing through the host feature set to the guest:
- No way to tell which features the host supports
While some features can be supported with no changes to kvm, others
need explicit support. That means kvm needs to vet the feature set
before it is passed to the guest.
- No support for indexed or stateful cpuid entries
Some cpuid entries depend on ecx as well as on eax, or on internal
state in the processor (running cpuid multiple times with the same
input returns different output). The current cpuid machinery only
supports keying on eax.
- No support for save/restore/migrate
The internal state above needs to be exposed to userspace so it can
be saved or migrated.
This patch adds extended cpuid support by means of three new ioctls:
- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
supports
- KVM_SET_CPUID2: sets the vcpu's cpuid table
- KVM_GET_CPUID2: gets the vcpu's cpuid table, including hidden state
[avi: fix original KVM_SET_CPUID not removing nx on non-nx hosts as it did
before]
Signed-off-by: Dan Kenigsberg <danken@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structures:
kvm_cpuid_entry
kvm_cpuid
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Move structures:
kvm_sregs
kvm_msr_entry
kvm_msrs
kvm_msr_list
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structures:
kvm_segment
kvm_dtable
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structure lapic_state from include/linux/kvm.h
to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structure kvm_regs to include/asm-x86/kvm.h.
Each architecture will need to create there own version of this
structure.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves structures:
kvm_pic_state
kvm_ioapic_state
to inclue/asm-x86/kvm.h.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch moves sturct kvm_memory_alias from include/linux/kvm.h
to include/asm-x86/kvm.h. Also have include/linux/kvm.h include
include/asm/kvm.h.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently kvm has a wart in that it requires three extra pages for use
as a tss when emulating real mode on Intel. This patch moves the allocation
internally, only requiring userspace to tell us where in the physical address
space we can place the tss.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Currently kvm provides hypercalls only for x86* architectures. To
provide hypercall infrastructure for other kvm architectures I split
kvm_para.h into a generic header file and architecture specific
definitions.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of having the kernel allocate memory to the guest, let userspace
allocate it and pass the address to the kernel.
This is required for s390 support, but also enables features like memory
sharing and using hugetlbfs backed memory.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The user is now able to set how many mmu pages will be allocated to the guest.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch refactors the current hypercall infrastructure to better
support live migration and SMP. It eliminates the hypercall page by
trapping the UD exception that would occur if you used the wrong hypercall
instruction for the underlying architecture and replacing it with the right
one lazily.
A fall-out of this patch is that the unhandled hypercalls no longer trap to
userspace. There is very little reason though to use a hypercall to
communicate with userspace as PIO or MMIO can be used. There is no code
in tree that uses userspace hypercalls.
[avi: fix #ud injection on vmx]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Use a standard list threaded through page->lru for maintaining the pgd
list on PAE. This is the same as 64-bit, and seems saner than using a
non-standard list via page->index.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
PAE mode requires that we reload cr3 in order to guarantee that
changes to the pgd will be noticed by the processor. This means that
in principle pud_clear needs to reload cr3 every time. However,
because reloading cr3 implies a tlb flush, we want to avoid it where
possible.
pud_clear() is only used in a couple of places:
- in free_pmd_range(), when pulling down a range of process address space, and
- huge_pmd_unshare()
In both cases, the calling code will do a a tlb flush anyway, so
there's no need to do it within pud_clear().
In free_pmd_range(), the pud_clear is immediately followed by
pmd_free_tlb(); we can hook that to make the mmu_gather do an
unconditional full flush to make sure cr3 gets reloaded.
In huge_pmd_unshare, it is followed by flush_tlb_range, which always
results in a full cr3-reload tlb flush.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.
If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.
With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.
In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.
An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.
A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff
Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In x86 PAE mode, stop treating pmds as a special case. Previously
they were always allocated and freed with the pgd. The modifies the
code to be the same as 64-bit mode, where they are allocated on
demand.
This is a step on the way to unifying 32/64-bit pagetable allocation
as much as possible.
There is a complicating wart, however. When you install a new
reference to a pmd in the pgd, the processor isn't guaranteed to see
it unless you reload cr3. Since reloading cr3 also has the
side-effect of flushing the tlb, this is an expense that we want to
avoid whereever possible.
This patch simply avoids reloading cr3 unless the update is to the
current pagetable. Later patches will optimise this further.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Yes! A mere 120 c_p_a() fixing and rewriting patches later,
we are now confident that we can enable UC by default for
ioremap(), on x86 too.
Every other architectures was doing this already. Doing so
makes Linux more robust against MTRR mixups (which might go
unnoticed if BIOS writers test other OSs only - where PAT
might override bad MTRRs defaults).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd.
I also cleaned up the NX test code quite a bit, and got rid of the ugly
exception table sorting stuff.
From: Arjan van de Ven <arjan@linux.intel.com>
This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option
as well as the NX CPU feature/mappings. Both testcases can move to tests/
once that patch gets merged into mainline.
(I'm half considering moving the rodata test into mm/init.c but I'll
wait with that until init.c is unified)
As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h
for the RODATA sections, which lead to 1 page less being marked read only.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The set_memory_* and set_pages_* family of API's currently requires the
callers to do a global tlb flush after the function call; forgetting this is
a very nasty deathtrap. This patch moves the global tlb flush into
each of the callers
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
change_page_attr_add is only used in pageattr.c now, so we can
make this function static.
change_page_attr() isn't used anywere at all anymore; this function
is a really bad API anyway so just remove the bloat entirely.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
page_is_ram has a FIXME since ages, which reminds to sanity check the
BIOS area between 640k and 1M, which is sometimes falsely reported as
RAM in the e820 tables.
Implement the sanity check. Move the BIOS range defines from
pageattr.c into e820.h to avoid duplicate defines.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the introduction of the new API, no driver or non-archcore code needs
to use c-p-a anymore, so this patch also deprecates the EXPORT_SYMBOL of CPA
(it's a horrible API after all).
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch converts various users of change_page_attr() to the new,
more intent driven set_page_*/set_memory_* API set.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Right now, if drivers or other code want to change, say, a cache attribute of a
page, the only API they have is change_page_attr(). c-p-a is a really bad API
for this, because it forces the caller to know *ALL* the attributes he wants
for the page, not just the 1 thing he wants to change. So code that wants to
set a page uncachable, needs to be aware of the NX status as well etc etc etc.
This patch introduces a set of new APIs for this, set_pages_<attr> and
set_memory_<attr>, that offer a logical change to the user, and leave all
attributes not implied by the requested logical change alone.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
64bit uses end_pfn_map and 32bit uses max_low_pfn. There are several
files which have #ifdef'ed defines which map either to end_pfn_map or
max_low_pfn. Replace this by a universal define and clean up all the
other instances.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
do some leftover cleanups in the now unified arch/x86/mm/pageattr.c
file.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
get more testing of the c_p_a() code done by not turning off
PSE on DEBUG_PAGEALLOC.
this simplifies the early pagetable setup code, and tests
the largepage-splitup code quite heavily.
In the end, all the largepages will be split up pretty quickly,
so there's no difference to how DEBUG_PAGEALLOC worked before.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes some bugs of making EFI runtime code executable.
- Use change_page_attr in i386 too. Because the runtime code may be
mapped not through ioremap.
- If there is no _PAGE_NX in __supported_pte_mask, the change_page_attr
is not called.
- Make efi_ioremap map pages as PAGE_KERNEL_EXEC_NOCACHE, because EFI runtime
code may be mapped through efi_ioremap.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The SMP trampoline always runs in real mode, so making it executable
in the page tables doesn't make much sense because it executes
before page tables are set up. That was the only user of
set_kernel_exec(). Remove set_kernel_exec().
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The pte_* modifier functions that cleared bits dropped the NX bit on 32bit
PAE because they only worked in int, but NX is in bit 63. Fix that
by adding appropiate casts so that the arithmetic happens as long long
on PAE kernels.
I decided to just use 64bit arithmetic instead of open coding like
pte_modify() because gcc should generate good enough code for that now.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
64bit already had it.
Needed for later patches.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No need to make it 64bit there.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix a long-standing weakness of the early-ioremap allocator: it
uses a single pgd entry for the boot mappings, and was not properly
protecting itself against crossing a 2MB (4MB) boundary.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
increase max early_ioremap() remapping size from 64K to 256K.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch renames bt_ioremap to early_ioremap, which is used in
x86_64. This makes it easier to merge i386 and x86_64 usage.
[ mingo@elte.hu: fix ]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch replaces boot_ioremap invokation with bt_ioremap and
removes the boot_ioremap implementation.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch makes it possible for bt_ioremap() to be used before
paging_init(), via providing an early implementation of set_fixmap()
that can be used before paging_init().
This way boot_ioremap() can be replaced by bt_ioremap().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
based on this patch from Andi Kleen:
| Subject: CPA: Return the page table level in lookup_address()
| From: Andi Kleen <ak@suse.de>
|
| Needed for the next change.
|
| And change all the callers.
and ported it to x86.git.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Needed for some test code.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
- Rename it to pte_exec() from pte_exec_kernel(). There is nothing
kernel specific in there.
- Move it into the common file because _PAGE_NX is 0 on !PAE and then
pte_exec() will be always evaluate to true.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Prepare ioremap() to default to uncached. This will be the
safest - but first we have to fix CPA.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Convert macros into inline functions, for better type-checking.
This patch required a little bit of fiddling with headers in order to
make __(pte|pmd)_free_tlb inline rather than macros.
asm-generic/tlb.h includes asm/pgalloc.h, though it doesn't directly
use any pgalloc definitions. I removed this include to avoid an
include cycle, but it may cause secondary compile failures by things
depending on the indirect inclusion; arch/x86/mm/hugetlbpage.c was one
such place; there may be others.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add mm to paravirt_alloc_pd, partly to make it consistent with
paravirt_alloc_pt, and because later changes will make use of it.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds support for the RDC R-321x system-on-chip,
also known as R-861x-(G). It uses the generic GPIO API and
has support for the on-chip hardware watchdog.
Build-fix from: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch defines the PCI identifiers found in
the RDC R-321x System-on-Chip.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds the generic GPIO support to the x86
architecture. We do the same as for MIPS, we let
the machine override the gpio callbacks and provide
defaults one in mach-generic.
Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The existing Geode GPIO API only allows for updating one GPIO at once. There
are instances where users want to update multiple GPIOs at once. With the
current API, they are given two choices; either ignore the GPIO API:
outl(0xc000, gpio_base + GPIO_OUTPUT_VAL);
outl(0xc000, gpio_base + GPIO_OUTPUT_ENABLE);
Alternatively, call each GPIO update separately:
geode_gpio_set(14, GPIO_OUTPUT_VAL);
geode_gpio_set(15, GPIO_OUTPUT_VAL);
geode_gpio_set(14, GPIO_OUTPUT_ENABLE);
geode_gpio_set(15, GPIO_OUTPUT_ENABLE);
Neither are desirable. This patch changes the GPIO API to allow for setting
of multiple GPIOs at once; rather than being passed an integer, we pass
a bitmask and provide a translation function. The above code would now
look like this:
geode_gpio_set(geode_gpio(14)|geode_gpio(15), GPIO_OUTPUT_VAL);
geode_gpio_set(geode_gpio(14)|geode_gpio(15), GPIO_OUTPUT_ENABLE);
Since there are no upstream users of the GPIO API yet (afaik), best to
change this now. This also adds a bit of sanity checking; it is no
longer possible to use a GPIO above 28.
Note the semantics of geode_gpio_isset() have changed:
geode_gpio_isset(geode_gpio(3)|geode_gpio(4), ...)
will only return true iff both GPIOs are set.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
when MTRRs are not covering the whole e820 table, we need to trim the
RAM and need to update e820.
reuse some code on 64-bit as well.
here need to add early_get_cap and use it in early_cpu_detect, and move
mtrr_bp_init early.
The code successfully trimmed the memory map on Justin's system:
from:
[ 0.000000] BIOS-e820: 0000000100000000 - 000000022c000000 (usable)
to:
[ 0.000000] modified: 0000000100000000 - 0000000228000000 (usable)
[ 0.000000] modified: 0000000228000000 - 000000022c000000 (reserved)
According to Justin it makes quite a difference:
| When I boot the box without any trimming it acts like a 286 or 386,
| takes about 10 minutes to boot (using raptor disks).
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Tested-by: Justin Piszcz <jpiszcz@lucidpixels.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
enabled, then interrupts don't work for the rtc-cmos driver which results in
RTC_AIE*, RTC_PIE* and RTC_ALM being unusable. This affects hwclock from
util-linux-ng at least on i386 since that uses RTC_PIE_ON. (For x86-64, a
polling method is used for unknown reasons.)
This patch series now
1. export the functions from arch/x86/kernel/hpet.c that the old char/rtc
driver uses to work around that problem,
2. makes it possible to compile the old rtc driver as module, while still
having CONFIG_HPET_EMULATE_RTC enabled and
3. makes use of the exported functions in (1) in the new rtc-cmos driver.
This patch:
This patch makes the RTC emulation functions in arch/x86/kernel/hpet.c usable
for kernel modules. It
- exports the functions (EXPORT_SYMBOL_GPL()),
- adds an interface to register the interrupt callback function
instead of using only a fixed callback function and
- replaces the rtc_get_rtc_time() function which depends on
CONFIG_RTC with a call to get_rtc_time() which is defined in
include/asm-generic/rtc.h.
The only dependency to CONFIG_RTC is the call to rtc_interrupt() which is
removed by the next patch. After this, there's no (code) dependency of
this functions to CONFIG_RTC=y any more.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Robert Picco <Robert.Picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Change the size of node ids for X86_64 from u8 to s16 to
accomodate more than 32k nodes and allow for NUMA_NO_NODE
(-1) to be sign extended to int.
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
call early_cpu_to_node() since per_cpu(cpu_to_node_map) might not be setup
yet.
I also had to export x86_cpu_to_node_map_early_ptr because of some calls
from the network code to numa_node_id():
net/ipv4/netfilter/arp_tables.c:
net/ipv4/netfilter/ip_tables.c:
net/ipv4/netfilter/ip_tables.c:
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Provide a means to trap usages of per_cpu map variables before
they are setup. Define CONFIG_DEBUG_PER_CPU_MAPS to activate.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>