The serial UML OS-abstraction layer patch (um/kernel dir).
This moves all systemcalls from initrd_user.c file under os-Linux dir and join
initrd_user.c and initrd_kern.c files in new file initrd.c
Signed-off-by: Gennady Sharapov <Gennady.V.Sharapov@intel.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Oleg Drokin: This patch is needed to support kernel modules that want to
use clear_user() (that is exported symbol on all other architectures).
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Byte-swapping of the port and IP address passed in to the multicast driver by
the user used to happen in different places, which was a bug in itself. The
port also was swapped before being printk-ed, which led to a misleading
message. This patch moves the port swapping to the same place as the IP
address swapping. It also cleans up the error paths of mcast_open.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch cleans up the delay implementations a bit, makes the loops
unoptimizable, and exports __udelay and __const_udelay.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Any access to a PROT_NONE page should segfault the process. A JVM seems to do
this on purpose. Also, Al noticed some bogus code, which is now deleted.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some changes that I sent in didn't make 2.6.12-rc4 for some reason. This
adds them back. We have
an x86_64 definition of TOP_ADDR
a reimplementation of the x86_64 csum_partial_copy_from_user
some syntax fixes in arch/um/kernel/ptrace.c
removal of a CFLAGS definition in the x86_64 Makefile
some include changes in the x86_64 ptrace.c and user-offsets.h
a syntax fix in elf-x86_64.h
Also moved an include in the i386 and x86_64 Makefiles to make the symlinks
work, and some small fixes from Al Viro.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If you tried to open a packet device first in read-only mode and then a
second time in read-write mode, the second open succeeded even though the
device was not correctly set up for writing. If you then tried to write
data to the device, the writes would fail with I/O errors.
This patch prevents that problem by making the second open fail with
-EBUSY.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Cc: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The new period/dt setting routines don't get the coupling of these
parameters correct. This means that Domain Validation never gets DT
set, and thus the drive gets restricted to U80.
Fix this by restoring the couplings in the set routines.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Tampering with the settings has to be done under the host lock ...
slave_alloc isn't called under any lock, so this has to be done
explicitly.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The allocation of all of our components should be done in slave alloc.
Currently it's rather fancifully refcounted in the queuecommand
callback. This patch moves allocation and destroy to their correct
places in slave_alloc/slave_destory. Now we can guarantee that
everywhere a device is requested, it's actually been allocated, so don't
check for this anymore.
Additionally, the per device busy timer was the only source of potential
use after free. It's been deleted because Linux does the correct thing
with busy returns, so there's no need to implement a separate timer in
the driver.
Finally, implement code that forces all the device parameters to zero
(i.e. async and narrow) in the slave alloc, inform the spi class of the
bios recorded maximums and wait until slave configure before trying
anything more adventurous.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This should finish the spurious queue removal from aic7xxx (there are
other queues that are probably unnecessary, but at least the major and
obviously unnecessary ones are done with).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This was rendered obsolete by the busyq removal; remove some of the last
remnants of its presence.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
pci_alloc_consistent is under 4G by default. Also simplify the
definition of bus_dmamap_t.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
There's not much sense in sharing code anymore now that aic7xxx uses
various transport class facilities.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The aic7xxx driver has two spurious queues in it's linux glue code: the
busyq which queues incoming commands to the driver and the completeq
which queues finished commands before sending them back to the mid-layer
This patch just removes the busyq and makes the aic finally return the
correct status to get the mid-layer to manage its queueing, so a command
is either committed to the sequencer or returned to the midlayer for
requeue.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is similar to the previous sym2 problem. For Domain Validation to
work we can't allow any period setting to turn wide on if it was
previously off.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
There's a basic need not to have parameters go under or over certain
values when doing domain validation. The basic ones are
max_offset, max_width and min_period
This patch makes the transport class take and enforce these three
limits. Currently they can be set by the user, although they could
obviously be read from the HBA's on-board NVRAM area during
slave_configure (if it has one).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The recent change to add a timeout to strbuf flushing had
a negative performance impact. The udelay()'s are too long,
and they were done in the wrong order wrt. the register read
checks. Fix both, and things are happy again.
There are more possible improvements in this area. In fact,
PCI streaming buffer flushing seems to be part of the bottleneck
in network receive performance on my SunBlade1000 box.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for sysfs to the IPMI device interface.
Clean-ups based on Dimitry Torokovs comment by Philipp Hahn.
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Philipp Hahn <pmhahn@titan.lahn.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes an uninitialized variable warning in arch/ppc/kernel/setup.c,
and this time gcc is actually right, there is a path that could result
in offset being uninitialized. Zero is a sane default in this instance.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Recently the __copy_tofrom_user routine was modified to avoid doing
prefetches past the end of the source array. However, in doing so we
introduced a bug in that it now returns the wrong value for the number
of bytes not copied when a fault is encountered. This fixes it to
return the correct number.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We are computing phys in the code below and never using. This patch
takes out the redundant computation.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On ppc32, the platform code can supply a "progress" function that is
used to show progress through the boot. These functions are usually
in an init section and so can't be called after the init pages are
freed. Now that the cpu bringup code can be called after the system
is booted (for hotplug cpu) we can get the situation where the
progress function can be called after boot. The simple fix is to set
the progress function pointer to NULL when the init pages are freed,
and that is what this patch does (note that all callers already check
whether the function pointer is NULL before trying to call it).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As noted by Chris Wright, we need to do the full range of tests regardless
of whether MAP_FIXED is set or not, so re-organize get_unmapped_area()
slightly to do the sanity checks unconditionally.
In netlink_broadcast() we're sending shared skb's to netlink listeners
when possible (saves some copying). This is OK, since we hold the only
other reference to the skb.
However, this implies that we must drop our reference on the skb, before
allowing a receiving socket to disappear. Otherwise, the socket buffer
accounting is disrupted.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cloned packets don't need the orphan call.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bug causes:
assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (122)
What's happening is that:
1) The skb is sent to socket 1.
2) Someone does a recvmsg on socket 1 and drops the ref on the skb.
Note that the rmalloc is not returned at this point since the
skb is still referenced.
3) The same skb is now sent to socket 2.
This version of the fix resurrects the skb_orphan call that was moved
out, last time we had 'shared-skb troubles'. It is practically a no-op
in the common case, but still prevents the possible race with recvmsg.
Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to verify that the payload contains enough data so that
attach_one_algo can copy alg_key_len bits from the payload.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable alg_key_len is in bits and not bytes. The function
attach_one_algo is currently using it as if it were in bytes.
This causes it to read memory which may not be there.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove extra __ip_vs_conn_put for incoming ICMP in direct routing
mode. Mark de Vries reports that IPVS connections are not leaked anymore.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
currently it opencodes it, but that's in the way of chaning the
lookup_hash interface.
I'd prefer to disallow modular af_unix over exporting lookup_create,
but I'll leave that to you.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevent the topdown allocator from allocating mmap areas all the way
down to address zero.
We still allow a MAP_FIXED mapping of page 0 (needed for various things,
ranging from Wine and DOSEMU to people who want to allow speculative
loads off a NULL pointer).
Tested by Chris Wright.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Having frag_list members which holds wmem of an sk leads to nightmares
with partially cloned frag skb's. The reason is that once you unleash
a skb with a frag_list that has individual sk ownerships into the stack
you can never undo those ownerships safely as they may have been cloned
by things like netfilter. Since we have to undo them in order to make
skb_linearize happy this approach leads to a dead-end.
So let's go the other way and make this an invariant:
For any skb on a frag_list, skb->sk must be NULL.
That is, the socket ownership always belongs to the head skb.
It turns out that the implementation is actually pretty simple.
The above invariant is actually violated in the following patch
for a short duration inside ip_fragment. This is OK because the
offending frag_list member is either destroyed at the end of the
slow path without being sent anywhere, or it is detached from
the frag_list before being sent.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like skb_cow_data() does not set
proper owner for newly created skb.
If we have several fragments for skb and some of them
are shared(?) or cloned (like in async IPsec) there
might be a situation when we require recreating skb and
thus using skb_copy() for it.
Newly created skb has neither a destructor nor a socket
assotiated with it, which must be copied from the old skb.
As far as I can see, current code sets destructor and socket
for the first one skb only and uses truesize of the first skb
only to increment sk_wmem_alloc value.
If above "analysis" is correct then attached patch fixes that.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract DMA boundary bit selection into a seperate
function, tg3_calc_dma_bndry(). Call this from
tg3_test_dma().
Make DMA test more reliable by using no DMA boundry
setting during the test. If the test passes, then
use the setting we selected before the test.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Even though we do software interrupt mitigation
via NAPI, it still helps to have some minimal
hw assisted mitigation.
This helps, particularly, on systems where register
I/O overhead is much greater than the CPU horsepower.
For example, it helps on NUMA systems. In such cases
the PIO overhead to disable interrupts for NAPI accounts
for the majority of the packet processing cost. The
CPU is fast enough such that only a single packet is
processed by each NAPI poll call.
Thanks to Michael Chan for reviewing this patch.
Signed-off-by: David S. Miller <davem@davemloft.net>
When supported, use the TAGGED interrupt processing support
the chip provides. In this mode, instead of a "on/off" binary
semaphore, an incrementing tag scheme is used to ACK interrupts.
All MSI supporting chips support TAGGED mode, so the tg3_msi()
interrupt handler uses it unconditionally. This invariant is
verified when MSI support is tested.
Since we can invoke tg3_poll() multiple times per interrupt under
high packet load, we fetch a new copy of the tag value in the
status block right before we actually do the work.
Also, because the tagged status tells the chip exactly which
work we have processed, we can make two optimizations:
1) tg3_restart_ints() need not check tg3_has_work()
2) the tg3_timer() need not poke the chip 10 times per
second to keep from losing interrupt events
Based upon valuable feedback from Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid console spam with ext3 aborted journal.
ext3 usually reports error conditions that it detects in its environment.
But when its journal gets aborted due to such errors, it can sometimes
continue to report that condition forever, spamming the console to such
an extent that the initial first cause of the journal abort can be lost.
When the journal aborts, we put the filesystem into readonly mode. Most
subsequent filesystem operations will get rejected immediately by checks
for MS_RDONLY either in the filesystem or in the VFS. But some paths do
not have such checks --- for example, if we continue to write to a file
handle that was opened before the fs went readonly. (We only check for
the ROFS condition when the file is first opened.) In these cases, we
can continue to generate log errors similar to
EXT3-fs error (device $DEV) in start_transaction: Journal has aborted
for each subsequent write.
There is really no point in generating these errors after the initial
error has been fully reported. Specifically, if we're starting a
completely new filesystem operation, and the filesystem is *already*
readonly (ie. the ext3 layer has already detected and handled the
underlying jbd abort), and we see an EROFS error, then there is simply
no point in reporting it again.
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't pass meaningless file handles to block device ioctls.
The recent raw IO ioctl-passthrough fix started passing the raw file
handle into the block device ioctl handler. That's unlikely to be
useful, as the file handle is actually open on a character-mode raw
device, not a block device, so dereferencing it is not going to yield
useful results to a block device ioctl handler.
Previously we just passed NULL; also not a value that can usefully
be dereferenced, but at least if it does happen, we'll oops instead of
silently pretending that the file is a block device, so NULL is the more
defensive option here. This patch reverts to that behaviour.
Noticed by Al Viro.
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Acked-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The driver model has a "detach_state" mechanism that:
- Has never been used by any in-kernel drive;
- Is superfluous, since driver remove() methods can do the same thing;
- Became buggy when the suspend() parameter changed semantics and type;
- Could self-deadlock when called from certain suspend contexts;
- Is effectively wasted documentation, object code, and headspace.
This removes that "detach_state" mechanism; net code shrink, as well
as a per-device saving in the driver model and sysfs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch includes various tweaks in the messaging that appears during
system pm state transitions:
* Warn about certain illegal calls in the device tree, like resuming
child before parent or suspending parent before child. This could
happen easily enough through sysfs, or in some cases when drivers
use device_pm_set_parent().
* Be more consistent about dev_dbg() tracing ... do it for resume() and
shutdown() too, and never if the driver doesn't have that method.
* Say which type of system sleep state is being entered.
Except for the warnings, these only affect debug messaging.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>