UBI (Latin: "where?") manages multiple logical volumes on a single
flash device, specifically supporting NAND flash devices. UBI provides
a flexible partitioning concept which still allows for wear-levelling
across the whole flash device.
In a sense, UBI may be compared to the Logical Volume Manager
(LVM). Whereas LVM maps logical sector numbers to physical HDD sector
numbers, UBI maps logical eraseblocks to physical eraseblocks.
More information may be found at
http://www.linux-mtd.infradead.org/doc/ubi.html
Partitioning/Re-partitioning
An UBI volume occupies a certain number of erase blocks. This is
limited by a configured maximum volume size, which could also be
viewed as the partition size. Each individual UBI volume's size can
be changed independently of the other UBI volumes, provided that the
sum of all volume sizes doesn't exceed a certain limit.
UBI supports dynamic volumes and static volumes. Static volumes are
read-only and their contents are protected by CRC check sums.
Bad eraseblocks handling
UBI transparently handles bad eraseblocks. When a physical
eraseblock becomes bad, it is substituted by a good physical
eraseblock, and the user does not even notice this.
Scrubbing
On a NAND flash bit flips can occur on any write operation,
sometimes also on read. If bit flips persist on the device, at first
they can still be corrected by ECC, but once they accumulate,
correction will become impossible. Thus it is best to actively scrub
the affected eraseblock, by first copying it to a free eraseblock
and then erasing the original. The UBI layer performs this type of
scrubbing under the covers, transparently to the UBI volume users.
Erase Counts
UBI maintains an erase count header per eraseblock. This frees
higher-level layers (like file systems) from doing this and allows
for centralized erase count management instead. The erase counts are
used by the wear-levelling algorithm in the UBI layer. The algorithm
itself is exchangeable.
Booting from NAND
For booting directly from NAND flash the hardware must at least be
capable of fetching and executing a small portion of the NAND
flash. Some NAND flash controllers have this kind of support. They
usually limit the window to a few kilobytes in erase block 0. This
"initial program loader" (IPL) must then contain sufficient logic to
load and execute the next boot phase.
Due to bad eraseblocks, which may be randomly scattered over the
flash device, it is problematic to store the "secondary program
loader" (SPL) statically. Also, due to bit-flips it may become
corrupted over time. UBI allows to solve this problem gracefully by
storing the SPL in a small static UBI volume.
UBI volumes vs. static partitions
UBI volumes are still very similar to static MTD partitions:
* both consist of eraseblocks (logical eraseblocks in case of UBI
volumes, and physical eraseblocks in case of static partitions;
* both support three basic operations - read, write, erase.
But UBI volumes have the following advantages over traditional
static MTD partitions:
* there are no eraseblock wear-leveling constraints in case of UBI
volumes, so the user should not care about this;
* there are no bit-flips and bad eraseblocks in case of UBI volumes.
So, UBI volumes may be considered as flash devices with relaxed
restrictions.
Where can it be found?
Documentation, kernel code and applications can be found in the MTD
gits.
What are the applications for?
The applications help to create binary flash images for two purposes: pfi
files (partial flash images) for in-system update of UBI volumes, and plain
binary images, with or without OOB data in case of NAND, for a manufacturing
step. Furthermore some tools are/and will be created that allow flash content
analysis after a system has crashed..
Who did UBI?
The original ideas, where UBI is based on, were developed by Andreas
Arnez, Frank Haverkamp and Thomas Gleixner. Josh W. Boyer and some others
were involved too. The implementation of the kernel layer was done by Artem
B. Bityutskiy. The user-space applications and tools were written by Oliver
Lohmann with contributions from Frank Haverkamp, Andreas Arnez, and Artem.
Joern Engel contributed a patch which modifies JFFS2 so that it can be run on
a UBI volume. Thomas Gleixner did modifications to the NAND layer. Alexander
Schmidt made some testing work as well as core functionality improvements.
Signed-off-by: Artem B. Bityutskiy <dedekind@linutronix.de>
Signed-off-by: Frank Haverkamp <haver@vnet.ibm.com>
Major features:
1) Tagged queuing support.
2) Will properly negotiate for synchronous transfers even on
devices that reject the wide negotiation message, such as
CDROMs
3) Significantly lower kernel stack usage in interrupt
handler path by elimination of function vector arrays,
replaced by a top-level switch statement state machine.
4) Uses generic scsi infrastructure as much as possible to
avoid code duplication.
5) Automatic request of sense data in response to CHECK_CONDITION
6) Portable to other platforms using ESP such as DEC and Sun3
systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
The only unfortunate bit here is that the name field of struct map_info
is not const, so for now we put a cast on the assignment of it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
(akpm: remove CVS control string too)
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPIN_LOCK_UNLOCKED cleanup,use __SPIN_LOCK_UNLOCKED instead
Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ROUND_UP macro cleanup use DIV_ROUND_UP
Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed tun/tap driver's handling of hw addresses. The hw address is stored
in both the net_device.dev_addr and tun.dev_addr fields. These fields were
not kept synchronized, and in fact weren't even initialized to the same
value. Now during both init and when performing SIOCSIFHWADDR on the tun
device these values are both updated. However, if SIOCSIFHWADDR is
performed on the net device directly (for instance, setting the hw address
using ifconfig), the tun device does not get updated. Perhaps the
tun.dev_addr field should be removed completely at some point, as it is
redundant and net_device.dev_addr can be used anywhere it is used.
Signed-off-by: Brian Braunstein <linuxkernel@bristyle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/hamradio/yam.c: In function `yam_tx_byte':
drivers/net/hamradio/yam.c:643: warning: passing arg 1 of `skb_copy_from_linear_data_offset' from incompatible pointer type
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current e1000_xmit_frame spews raw interrupt disabled nag messages when
used with RT kernel patches. This patch uses spin_trylock_irqsave,
which allows RT patches to properly manage the irq semantics.
Signed-off-by: Mark Huth <mhuth@mvista.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Upon code inspection it was spotted that the firmware handover bit get/set
mismatched, which may have resulted in management issues on PCI-E
adapters. Setting them correctly may fix some management issues such
as arp routing etc.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
DEBUG_SHIRQ code exposed that e1000 was not ready for incoming interrupts
after having called pci_request_irq. This obviously requires us to finish
our software setup which assigns the irq handler before we request the
irq.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Correct minor typo in drivers/net/wireless/Kconfig identified by
Stefano Brivio <stefano.brivio@polimi.it>.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch refactors the wireless Kconfig all over and already
introduces net/wireless/Kconfig with just the WEXT bit for now,
the cfg80211 patch will add to that as well.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pppoe_flush_dev() kicks all sockets bound to a device that is going down.
In doing so, locks must be taken in the right order consistently (sock lock,
followed by the pppoe_hash_lock). However, the scan process is based on
us holding the sock lock. So, when something is found in the scan we must
release the lock we're holding and grab the sock lock.
This patch fixes race conditions between this code and pppoe_release(),
both of which perform similar functions but would naturally prefer to grab
locks in opposing orders. Both code paths are now going after these locks
in a consistent manner.
pppoe_hash_lock protects the contents of the "pppox_sock" objects that reside
inside the hash. Thus, NULL'ing out the pppoe_dev field should be done
under the protection of this lock.
Signed-off-by: Michal Ostrowski <mostrows@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
below you find a patch that fixes a memory leak when a PPPoE socket is
release()d after it has been connect()ed, but before the PPPIOCGCHAN ioctl
ever has been called on it.
This is somewhat of a security problem, too, since PPPoE sockets can be
created by any user, so any user can easily allocate all the machine's
RAM to non-swappable address space and thus DoS the system.
Is there any specific reason for PPPoE sockets being available to any
unprivileged process, BTW? After all, you need a packet socket for the
discovery stage anyway, so it's unlikely that any unprivileged process
will ever need to create a PPPoE socket, no? Allocating all session IDs
for a known AC is a kind of DoS, too, after all - with Juniper ERXes,
this is really easy, actually, since they don't ever assign session ids
above 8000 ...
Signed-off-by: Florian Zumbiehl <florz@florz.de>
Acked-by: Michal Ostrowski <mostrows@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
below you find a patch that (hopefully) fixes a race between an interface
going down and a connect() to a peer on that interface. Before,
connect() would determine that an interface is up, then the interface
could go down and all entries referring to that interface in the
item_hash_table would be marked as ZOMBIEs and their references to
the device would be freed, and after that, connect() would put a new
entry into the hash table referring to the device that meanwhile is
down already - which also would cause unregister_netdevice() to wait
until the socket has been release()d.
This patch does not suffice if we are not allowed to accept connect()s
referring to a device that we already acked a NETDEV_GOING_DOWN for
(that is: all references are only guaranteed to be freed after
NETDEV_DOWN has been acknowledged, not necessarily after the
NETDEV_GOING_DOWN already). And if we are allowed to, we could avoid
looking through the hash table upon NETDEV_GOING_DOWN completely and
only do that once we get the NETDEV_DOWN ...
mostrows:
pppoe_flush_dev is called on NETDEV_GOING_DOWN and NETDEV_DOWN to deal with
this "late connect" issue. Ideally one would hope to notify users at the
"NETDEV_GOING_DOWN" phase (just to pretend to be nice). However, it is the
NETDEV_DOWN scan that takes all the responsibility for ensuring nobody is
hanging around at that time.
Signed-off-by: Florian Zumbiehl <florz@florz.de>
Acked-by: Michal Ostrowski <mostrows@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
below is a patch that just removes dead code/initializers without any
effect (first access is an assignment) that I stumbled accross while
reading the source.
Signed-off-by: Florian Zumbiehl <florz@florz.de>
Acked-by: Michal Ostrowski <mostrows@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch cb_lock to mutex and allow netlink kernel users to override it
with a subsystem specific mutex for consistent locking in dump callbacks.
All netlink_dump_start users have been audited not to rely on any
side-effects of the previously used spinlock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rusty added a new 'stats' field to struct net_device.
loopback driver can use it instead of declaring another struct
net_device_stats This saves some memory.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When csum_offset was introduced we did a conversion from csum to
csum_offset where applicable. A couple of drivers were missed in
this process.
It was harmless to begin with since the two fields coincided. Now
that we've made them different with the addition of csum_start, the
missed drivers must be converted or they can't send packets out at
all that require checksum offload.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
We have to put back the cast to "char *" because these
pointers are volatile.
Reported by Andrew Morton.
Signed-off-by: David S. Miller <davem@davemloft.net>
Network drivers which keep stats allocate their own stats structure
then write a get_stats() function to return them. It would be nice if
this were done by default.
1) Add a new "stats" field to "struct net_device".
2) Add a new feature field to say "this driver uses the internal one"
3) Have a default "get_stats" which returns NULL if that feature not set.
4) Change callers to check result of get_stats call for NULL, not if
->get_stats is set.
This should not break backwards compatibility with older drivers, yet
allow modern drivers to shed some boilerplate code.
Lightly tested: works for a modified lguest network driver.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to get more randomness for secure_tcpv6_sequence_number(),
secure_tcp_sequence_number(), secure_dccp_sequence_number() functions,
we can use the high resolution time services, providing nanosec
resolution.
I've also done two kmalloc()/kzalloc() conversions.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To clearly state the intent of copying from linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For consistency with other skb data accessors, reducing the number of direct
accesses to skb->data.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reducing the number of skb->data direct accesses.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At that point it is equivalent to what was being used, skb->end - skb->data,
and the need is clearly the one skb_tailroom satisfies.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the
number of direct accesses to skb->data and for consistency with all the other
cast skb member helpers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now to convert the last one, skb->data, that will allow many simplifications
and removal of some of the offset helpers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)
Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SMC SuperIO Chip LPC47N227 used for IrDA is not detected because its device
identification byte can be 0x7A instead of 0x5A.
Patch from Peter Kovar <peter.kovar@gmail.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
skb->mac to skb->mac_header, to match the names of the associated helpers
(skb[_[re]set]_{transport,network,mac}_header).
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len,
that is precalculated tho, don't think we need to bloat skb with one more
member, so just use this new helper, reducing the number of non-skbuff.h
references to the layer headers even more.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the transport header, it is
still legal to touch skb->h.raw directly if just adding to,
subtracting from or setting it to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to
avoid the longer, open coded equivalent.
Ditched a no-op in bnx2 in the process.
I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()...
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the quite common 'skb->h.raw - skb->data' sequence.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple cases:
skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()
The next ones will handle the slightly more "complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open
coded skb->nh.iph uses, now to go after the rest...
Just out of curiosity, here are the idioms found to get the same result:
skb->nh.iph->ihl << 2
skb->nh.iph->ihl<<2
skb->nh.iph->ihl * 4
skb->nh.iph->ihl*4
(skb->nh.iph)->ihl * sizeof(u32)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the quite common 'skb->nh.raw - skb->data' sequence.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can
later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For consistency with all the other skb->nh.raw accessors.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For consistency with all the other skb->nh.raw accessors.
Also do some really obvious simplifications in pppoe_recvmsg, well the
kfree_skb one is not so obvious, but free() and kfree() have the same behaviour
(hint :-) ).
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the mac header, it is still legal to
touch skb->mac.raw directly if just adding to, subtracting from or setting it
to another layer header.
This one also converts some more cases to skb_reset_mac_header() that my
regex missed as it had no spaces before nor after '=', ugh.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the cases where we want to set skb->mac.raw to an offset from skb->data.
Simple cases first, the memmove ones and specially pktgen will be left for later.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One less thing for drivers writers to worry about.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For consistency with other skb->mac.raw users.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now all the _type_trans routines are consistent in this regard.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the following not or no longer used exports:
- drivers/char/random.c: secure_tcp_sequence_number
- net/dccp/options.c: sysctl_dccp_feat_sequence_window
- net/netlink/af_netlink.c: netlink_set_err
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[PARPORT] SUNBPP: Fix OOPS when debugging is enabled.
[SPARC] openprom: Switch to ref counting PCI API
The packet driver is assuming (reasonably) that the (undocumented)
request.errors is an errno. But it is in fact some mysterious bitfield. When
things go wrong we return weird positive numbers to the VFS as pointers and it
goes oops.
Thanks to William Heimbigner for reporting and diagnosis.
(It doesn't oops, but this driver still doesn't work for William)
Cc: William Heimbigner <icxcnika@mar.tar.cc>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All RDMA drivers except ehca set class_dev->dev to their dma_device
value (ehca leaves this unset). dma_device is the only value that
makes any sense, so move this assignment to core/sysfs.c. This reduce
the duplicated code in the rest of the drivers and gives ehca a nice
/sys/class/infiniband/ehcaX/device symlink.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add "Modify Port" verb support to eHCA driver. The IB communication
manager needs this to set the IsCM port capability bit when
initializing.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There are quite a few places in ipoib_cm.c where we know IRQs are
enabled because we do something that sleeps in the same function, so
we can convert several occurrences of spin_lock_irqsave() to a plain
spin_lock_irq(). This cleans up the source a little and makes the
code smaller too:
add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48)
function old new delta
ipoib_cm_tx_reap 403 406 +3
ipoib_cm_stale_task 146 145 -1
ipoib_cm_dev_stop 173 172 -1
ipoib_cm_tx_handler 964 956 -8
ipoib_cm_rx_handler 956 937 -19
ipoib_cm_skb_reap 212 190 -22
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Clarify code by changing return values from SMI functions to named
enum values instead of magic 0/1 values.
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We need to set the SGID index for routed MADs and pass received
GRH information to userspace when a MAD is received.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
To support destinations that are not on the local IB subnet, IPoIB
should include the GRH information when constructing an address
handle. Using the existing ib_init_ah_from_path() call will do this
for us.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
mthca_free_qp() already has local variables to hold the QP's send_cq
and recv_cq, so we can slightly clean up the calls to mthca_cq_clean()
by using those local variables instead of expressions like
to_mcq(qp->ibqp.send_cq).
Also, by cleaning the recv_cq first, we can avoid worrying about
whether the QP is attached to an SRQ for the second call, because we
would only clean send_cq if send_cq is not equal to recv_cq, and that
means send_cq cannot have any receive completions from the QP being
destroyed.
All this work even improves the generated code a bit:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
function old new delta
mthca_free_qp 510 505 -5
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit b2875d4c ("IB/mthca: Always fill MTTs from CPU") causes a crash
in mthca_write_mtt() with non-memfree HCAs that have their memory
hidden (that is, have only two PCI BARs instead of having a third BAR
that allows access to the RAM attached to the HCA) on 64-bit
architectures. This is because the commit just before, c20e20ab
("IB/mthca: Merge MR and FMR space on 64-bit systems") makes
dev->mr_table.fmr_mtt_buddy equal to &dev->mr_table.mtt_buddy and
hence mthca_write_mtt() tries to write directly into the HCA's MTT
table. However, since that table is in the HCA's memory, this is
impossible without the PCI BAR that gives access to that memory.
This causes a crash because mthca_tavor_write_mtt_seg() basically
tries to dereference some offset of a NULL pointer. Fix this by
adding a test of MTHCA_FLAG_FMR in mthca_write_mtt() so that we always
use the WRITE_MTT firmware command rather than writing directly if
FMRs are not enabled.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Tweak a register setting to prevent the tx mailbox from halting.
Update version to 1.5.8.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparc64:
drivers/net/hamradio/baycom_ser_fdx.c: In function `ser12_open':
drivers/net/hamradio/baycom_ser_fdx.c:417: error: `NR_IRQS' undeclared (first us
e in this function)
drivers/net/hamradio/baycom_ser_fdx.c:417: error: (Each undeclared identifier is
reported only once
drivers/net/hamradio/baycom_ser_fdx.c:417: error: for each function it appears i
n.)
Cc: Folkert van Heusden <folkert@vanheusden.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Broken by 4a1728a28a which switched the
return semantics of read_mii_word() but didn't fix usage of
read_mii_word() to conform to the new semantics.
Setting carrier to off based on the NO_CARRIER flag is also incorrect as
that flag only triggers on TX failure and therefore isn't correct when
no frames are being transmitted. Since there is already a 2*HZ MII
carrier check going on, defer to that.
Add a TRUST_LINK_STATUS feature flag for adapters where the LINK_STATUS
flag is actually correct, and use that rather than the NO_CARRIER flag.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The sis900 driver appears to have a bug in which the receive routine
passes the skbuff holding the received frame to the network stack before
refilling the buffer in the rx ring. If a new skbuff cannot be allocated, the
driver simply leaves a hole in the rx ring, which causes the driver to stop
receiving frames and become non-recoverable without an rmmod/insmod according to
reporters. This patch reverses that order, attempting to allocate a replacement
buffer first, and receiving the new frame only if one can be allocated. If no
skbuff can be allocated, the current skbuf in the rx ring is recycled, dropping
the current frame, but keeping the NIC operational.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The following patch fixes a kernel bug in depca_platform_probe().
We don't use a dynamic pointer for pldev->dev.platform_data, so it seems
that the correct way to proceed if platform_device_add(pldev) fails is
to explicitly set the pldev->dev.platform_data pointer to NULL, before
calling the platform_device_put(pldev), or it will be kfree'ed by
platform_device_release().
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Commit 40b36daa introduced possibility that serial8250_backup_timeout() ->
serial8250_handle_port() locks port.lock without disabling irqs, thus
allowing deadlock against interrupt handler (port.lock is acquired in
serial8250_interrupt()).
Spotted by lockdep.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two functions are called from __devinit context, but they are marked as
__init. Fix this.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On ia64, kernel headers define REGION_OFFSET so we can't use that.
Reported by Andrew Morton.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: David Hubbard <david.c.hubbard@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
irq values are u32, not u8. Large irq numbers will be truncated,
free_irq may free a different irq.
Remove incorrectly sized struct member and use the one from pci_dev.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use relative time, not absolute. Discovered by Jung-Ik (John) Lee
<jilee@google.com>.
Cc: Jung-Ik (John) Lee <jilee@google.com>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pcd_lock and pf_spin_lock are passed to blk_init_queue() which, seeing them
as valid lock pointer, sets it as ->queue_lock.
The problem is that pcd_lock and pf_spin_lock aren't initialized anywhere.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was schedule() missing in the TIOCMIWAIT ioctl. Solve it by moving
the code to the wait_event_interruptible.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Jan Yenya Kasprzak <kas@fi.muni.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was schedule() missing in the TIOCMIWAIT ioctl. Solve it by moving
the code to the wait_event_interruptible.
Cc: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I encountered the following kernel panic. The cause of this problem was
NULL pointer access in check_modem_status() in 8250.c. I confirmed this
problem is fixed by the attached patch, but I don't know this is the
correct fix.
sadc[4378]: NaT consumption 2216203124768 [1]
Modules linked in: binfmt_misc dm_mirror dm_mod thermal processor fan
container button sg e100 eepro100 mii ehci_hcd ohci_hcd
Pid: 4378, CPU 0, comm: sadc
psr : 00001210085a2010 ifs : 8000000000000289 ip : [<a000000100482071>]
Not tainted
ip is at check_modem_status+0xf1/0x360
Call Trace:
[<a000000100013940>] show_stack+0x40/0xa0
[<a0000001000145a0>] show_regs+0x840/0x880
[<a0000001000368e0>] die+0x1c0/0x2c0
[<a000000100036a30>] die_if_kernel+0x50/0x80
[<a000000100037c40>] ia64_fault+0x11e0/0x1300
[<a00000010000bdc0>] ia64_leave_kernel+0x0/0x280
[<a000000100482070>] check_modem_status+0xf0/0x360
[<a000000100482300>] serial8250_get_mctrl+0x20/0xa0
[<a000000100478170>] uart_read_proc+0x250/0x860
[<a0000001001c16d0>] proc_file_read+0x1d0/0x4c0
[<a0000001001394b0>] vfs_read+0x1b0/0x300
[<a000000100139cd0>] sys_read+0x70/0xe0
[<a00000010000bc20>] ia64_ret_from_syscall+0x0/0x20
[<a000000000010620>] __kernel_syscall_via_break+0x0/0x20
Fix the possible NULL pointer access in check_modem_status() in 8250.c. The
check_modem_status() would access 'info' member of uart_port structure, but it
is not initialized before uart_open() is called. The check_modem_status() can
be called through /proc/tty/driver/serial before uart_open() is called.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Taku Izumi <izumi2005@soft.fujitsu.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The debugging code would dereference __iomem pointers instead
of going through sbus_{read,write}{b,w,l}().
Signed-off-by: David S. Miller <davem@davemloft.net>
USRobotics Wireless Adapter (Model 5423) works well with current
zd1211rw driver also (i have tested 2.6.18, 2.6.20 and 2.6.21-rc7).
It just needs its ID added to the list of devices.
Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The MAC address assignment at module loading is simply forgotten.
The bug at module unloading is caused by an incorrect call.
The bug at module unloading does not only happen for sunqe,
sunlance and sunhme (sbus) suffer from it too.
I've tested this on my SS20.
Signed-off-by: Marcel van Nies <morcles@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replacing kmalloc/memset combination with kzalloc.
Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ide_hwif_to_major[] has only 10 entries as there are 10 major numbers
reserved for IDE (if somebody needs more it shouldn't be hard to fix).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
The driver crashes the kernel on HPT302N chips due to the missing initializer
for 'hpt302n.settings' having been unfortunately overlooked so far. :-<
Much thanks to Mike Mattie for pin-pointing the reason of crash.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Add PCI ID for a newer variant of cardbus CF/IDE adapter card.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This reverts commit 60cba200f1. It's been
linked to lockups of the e1000 hardware, see for example
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229603
but it's likely that the commit itself is not really introducing the
bug, but just allowing an unrelated problem to rear its ugly head (ie
one current working theory is that the code exposes us to a hardware
race condition by decreasing the amount of time we spend in each NAPI
poll cycle).
We'll revert it until root cause is known. Intel has a repeatable
reproduction on two different machines and bus traces of the hardware
doing something bad.
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg KH <gregkh@suse.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A small number of SiS setups require special handling (not many judging
by how long this dumb bug survived). A couple of Fedora 7 devel testers
hit an Oops on pata_sis loading which is caused by terminal confusion
between chipset as 'the chipset we have found' and chipset as 'array
iterator'
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
From: Paul Mackerras <paulus@samba.org>
This fixes:
Subject: kernel BUG at net/core/skbuff.c in linux-2.6.21-rc6
process_input_packet() treats the case where the first byte is 0xff
(PPP_ALLSTATIONS) but the second byte is 0x03 (PPP_UI) as indicating a
packet with a PPP protocol number of 0xff. Arguably that's wrong
since PPP protocol 0xff is reserved, and the RFC does envision the
possibility of receiving frames where the control field has values
other than 0x03.
Signed-off-by: David S. Miller <davem@davemloft.net>
The Yukon FE (100mbit only) chips do not support large packets.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The Yukon EC Ultra chips have transmit settings for store and
forward and PCI buffering. By setting these appropriately, normal
performance goes from 750Mbytes/sec to 940Mbytes/sec (non-jumbo).
It is also possible to do Jumbo mode, but it means turning off
TSO and checksum offload so the performance gets worse. There isn't
enough buffering for checksum offload to work.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Need to make sure and disable ASF on all chip types. Otherwise, there may be
random reboots.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
There should never be descriptor error unless hardware or driver is buggy.
But if an error occurs, print useful information, clear irq, and recover.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This device is having all sorts of problems that lead to data corruption
and system instability. It gets receive status and data out of order,
it generates descriptor and TSO errors, etc.
Until the problems are resolved, it should not be used by anyone
who cares about there system.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The basic structure of "normal" UDP/IP/Ethernet
frames (that actually work):
- It starts with the Ethernet header (dest MAC, src MAC, etc.)
- The next part is occupied by the IP header (version info, length of
packet, id=0, fragment offset=0, checksum, from / to address, etc.)
- Then comes the UDP header (src / dest port, length, checksum)
- Actual payload
- Ethernet checksum
Now what's different for IP fragment:
- The IP header has id set to some value (same for all fragments),
offset is set appropriately (i.e. 0 for first fragment, following
according to size of other fragments), size is the length of the frame.
- UDP header is unchanged. I.e. length is according to full UDP
datagram, not just the part within the actual frame! But this is only
true within the first frame: all following frames don't have a valid
UDP-header at all.
The spidernet silicon seems to be quite intelligent: It's able to
compute (IP / UDP / Ethernet) checksums on the fly and tests if frames
are conforming to RFC -- at least conforming to RFC on complete frames.
But IP fragments are different as explained above:
I.e. for IP fragments containing part of a UDP datagram it sees
incompatible length in the headers for IP and UDP in the first frame
and, thus, skips this frame. But the content *is* correct for IP
fragments. For all following frames it finds (most probably) no valid
UDP header at all. But this *is* also correct for IP fragments.
The Linux IP-stack seems to be clever in this point. It expects the
spidernet to calculate the checksum (since the module claims to be able
to do so) and marks the skb's for "normal" frames accordingly
(ip_summed set to CHECKSUM_HW).
But for the IP fragments it does not expect the driver to be capable to
handle the frames appropriately. Thus all checksums are allready
computed. This is also flaged within the skb (ip_summed set to
CHECKSUM_NONE).
Unfortunately the spidernet driver ignores that hints. It tries to send
the IP fragments of UDP datagrams as normal UDP/IP frames. Since they
have different structure the silicon detects them the be not
"well-formed" and skips them.
The following one-liner against 2.6.21-rc2 changes this behavior. If the
IP-stack claims to have done the checksumming, the driver should not
try to checksum (and analyze) the frame but send it as is.
Signed-off-by: Norbert Eicker <n.eicker@fz-juelich.de>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove assumption that PHY interrupts use GPIOs 3 and 5.
Deal with PHY interrupts connected to any GPIO pins.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Reuse the incoming skb when a clientless abort req is recieved.
The release of RDMA connections HW resources might be deferred in
low memory situations.
Ensure that no further activity is passed up to the RDMA driver
for these connections.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Nonpae guest pdes are shadowed by two pae ptes, so we double the offset
twice: once to account for the pte size difference, and once because we
need to shadow pdes for a single guest pde.
But when writing to the upper guest pde we also need to truncate the
lower bits, otherwise the multiply shifts these bits into the pde index
and causes an access to the wrong shadow pde. If we're at the end of the
page (accessing the very last guest pde) we can even overflow into the
next host page and oops.
Signed-off-by: Avi Kivity <avi@qumranet.com>
The kernel ib_wc structure now uses a QP pointer, but the user space
equivalent uses a QP number instead. This means we can no longer use
a simple structure copy to copy stuff into user space.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Don't let userspace use the direct-physical-map L_key or R_key.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
At some point things changed so that all the affinity bits can be set,
but cpus_full() macro is not true. This caused problems with the unit
selection logic on multi-unit (board) configurations.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move the code that shuts down the IB link earlier in the unload
process, to be sure no new packets can arrive while we are unloading.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
To prevent random utility reads and writes of the diag interface to the
chip, we first require a handshake of reading from offset 0 and writing
to offset 0 before any other reads or writes can be done through the
diags device. Otherwise chip errors can be triggered.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the chip is no longer usable, LEDs should be turned off so system
can be found easily in the cluster.
Also some minor reorganizing so both chips print hardware error
message at same point and only if there were unrecovered errors
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Re-init of the kernel structures after a chip reset was leaving the
portdata structure for port zero in an inconsistent state, and a
pointer to it either stale (in re-init code) or NULL (in devdata)
Fixing the order of operations on this struct, and the condition for
interrupt access, prevents the crashes.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>