Commit Graph

304116 Commits

Author SHA1 Message Date
Sarah Sharp
dbc33303e4 xhci: Reserve one command for USB3 LPM disable.
We want to do everything we can to ensure that USB 3.0 Link Power
Management (LPM) can be disabled when it is enabled.  If LPM can't be
disabled, we can't suspend USB 3.0 devices, or reset them.  To make sure
we can submit the command to disable LPM, allocate a command in the
xhci_hcd structure, and reserve one TRB on the command ring.

We only need one command per xHCI driver instance, because LPM is only
disabled or enabled while the USB core is holding the bandwidth_mutex
that is shared between the xHCI USB 2.0 and USB 3.0 roothubs.  The
bandwidth_mutex will be held until the command completes, or times out.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2012-05-18 15:42:01 -07:00
Sarah Sharp
4b2665418c xhci: Some Evaluate Context commands must succeed.
The upcoming USB 3.0 Link PM patches will introduce new API to enable
and disable low-power link states.  We must be able to disable LPM in
order to reset a device, or place the device into U3 (device suspend).
Therefore, we need to make sure the Evaluate Context command to disable
the LPM timeouts can't fail due to there being no room on the command
ring.

Introduce a new flag to the function that queues the Evaluate Context
command, command_must_succeed.  This tells the ring handler that a TRB
has already been reserved for the command (by incrementing
xhci->cmd_ring_reserved_trbs), and basically ensures that prepare_ring()
won't fail.  A similar flag was already implemented for the Configure
Endpoint command queuing function.

All functions that currently call xhci_configure_endpoint() to issue an
Evaluate Context command pass "false" for the "must_succeed" parameter,
so this patch should have no effect on current xHCI driver behavior.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2012-05-18 15:42:00 -07:00
Sarah Sharp
8306095fd2 USB: Disable USB 3.0 LPM in critical sections.
There are several places where the USB core needs to disable USB 3.0
Link PM:
 - usb_bind_interface
 - usb_unbind_interface
 - usb_driver_claim_interface
 - usb_port_suspend/usb_port_resume
 - usb_reset_and_verify_device
 - usb_set_interface
 - usb_reset_configuration
 - usb_set_configuration

Use the new LPM disable/enable functions to temporarily disable LPM
around these critical sections.

We need to protect the critical section around binding and unbinding USB
interface drivers.  USB drivers may want to disable hub-initiated USB
3.0 LPM, which will change the value of the U1/U2 timeouts that the xHCI
driver will install.  We need to disable LPM completely until the driver
is bound to the interface, and the driver has a chance to enable
whatever alternate interface setting it needs in its probe routine.
Then re-enable USB3 LPM, and recalculate the U1/U2 timeout values.

We also need to disable LPM in usb_driver_claim_interface,
because drivers like usbfs can bind to an interface through that
function.  Note, there is no way currently for userspace drivers to
disable hub-initiated USB 3.0 LPM.  Revisit this later.

When a driver is unbound, the U1/U2 timeouts may change because we are
unbinding the last driver that needed hub-initiated USB 3.0 LPM to be
disabled.

USB LPM must be disabled when a USB device is going to be suspended.
The USB 3.0 spec does not define a state transition from U1 or U2 into
U3, so we need to bring the device into U0 by disabling LPM before we
can place it into U3.  Therefore, call usb_unlocked_disable_lpm() in
usb_port_suspend(), and call usb_unlocked_enable_lpm() in
usb_port_resume().  If the port suspend fails, make sure to re-enable
LPM by calling usb_unlocked_enable_lpm(), since usb_port_resume() will
not be called on a failed port suspend.

USB 3.0 devices lose their USB 3.0 LPM settings (including whether USB
device-initiated LPM is enabled) across device suspend.  Therefore,
disable LPM before the device will be reset in
usb_reset_and_verify_device(), and re-enable LPM after the reset is
complete and the configuration/alt settings are re-installed.

The calculated U1/U2 timeout values are heavily dependent on what USB
device endpoints are currently enabled.  When any of the enabled
endpoints on the device might change, due to a new configuration, or new
alternate interface setting, we need to first disable USB 3.0 LPM, add
or delete endpoints from the xHCI schedule, install the new interfaces
and alt settings, and then re-enable LPM.  Do this in usb_set_interface,
usb_reset_configuration, and usb_set_configuration.

Basically, there is a call to disable and then enable LPM in all
functions that lock the bandwidth_mutex.  One exception is
usb_disable_device, because the device is disconnecting or otherwise
going away, and we should not care about whether USB 3.0 LPM is enabled.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2012-05-18 15:41:59 -07:00
Sarah Sharp
1ea7e0e8e3 USB: Add support to enable/disable USB3 link states.
There are various functions within the USB core that will need to
disable USB 3.0 link power states.  For example, when a USB device
driver is being bound to an interface, we need to disable USB 3.0 LPM
until we know if the driver will allow hub-initiated LPM transitions.
Another example is when the USB core is switching alternate interface
settings.  The USB 3.0 timeout values are dependent on what endpoints
are enabled, so we want to ensure that LPM is disabled until the new alt
setting is fully installed.

Multiple functions need to disable LPM, and those functions can even be
nested.  For example, usb_bind_interface() could disable LPM, and then
call into the driver probe function, which may attempt to switch to a
different alt setting.  Therefore, we need to keep a count of the number
of functions that require LPM to be disabled at any point in time.

Introduce two new USB core API calls, usb_disable_lpm() and
usb_enable_lpm().  These functions increment and decrement a new
variable in the usb_device, lpm_disable_count.  If usb_disable_lpm()
fails, it will call usb_enable_lpm() in order to balance the
lpm_disable_count.

These two new functions must be called with the bandwidth_mutex locked.
If the bandwidth_mutex is not already held by the caller, it should
instead call usb_unlocked_disable_lpm() and usb_enable_lpm(), which take
the bandwidth_mutex before calling usb_disable_lpm() and
usb_enable_lpm(), respectively.

Introduce a new variable (timeout) in the usb3_lpm_params structure to
keep track of the currently enabled U1/U2 timeout values.  When
usb_disable_lpm() is called, and the USB device has the U1 or U2
timeouts set to a non-zero value (meaning either device-initiated or
hub-initiated LPM is enabled), attempt to disable LPM, regardless of the
state of the lpm_disable_count.  We want to ensure that all callers can
be guaranteed that LPM is disabled if usb_disable_lpm() returns zero.

Otherwise the following scenario could occur:

1. Driver A is being bound to interface 1.  usb_probe_interface()
disables LPM.  Driver A doesn't care if hub-initiated LPM is enabled, so
even though usb_disable_lpm() fails, the probe of the driver continues,
and the bandwidth mutex is dropped.

2. Meanwhile, Driver B is being bound to interface 2.
usb_probe_interface() grabs the bandwidth mutex and calls
usb_disable_lpm().  That call should attempt to disable LPM, even
though the lpm_disable_count is set to 1 by Driver A.

For usb_enable_lpm(), we attempt to enable LPM only when the
lpm_disable_count is zero.  If some step in enabling LPM fails, it will
only have a minimal impact on power consumption, and all USB device
drivers should still work properly.  Therefore don't bother to return
any error codes.

Don't enable device-initiated LPM if the device is unconfigured.  The
USB device will only accept the U1/U2_ENABLE control transfers in the
configured state.  Do enable hub-initiated LPM in that case, since
devices are allowed to accept the LGO_Ux link commands in any state.

Don't enable or disable LPM if the device is marked as not being LPM
capable.  This can happen if:
 - the USB device doesn't have a SS BOS descriptor,
 - the device's parent hub has a zeroed bHeaderDecodeLatency value, or
 - the xHCI host doesn't support LPM.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Andiry Xu <andiry.xu@amd.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2012-05-18 15:41:58 -07:00
Sarah Sharp
8afa408cba USB: Allow drivers to disable hub-initiated LPM.
USB 3.0 Link Power Management (LPM) is designed to allow individual
links in the bus to go into lower power states.  There are two ways a
link can enter a lower power state:

1. Device-initiated LPM.  When a USB device decides it can go into a
lower power link state, it sends a message to the parent hub, telling it
to go into either U1 or U2.  Device-initiated LPM is good for devices
that send data to the host, like communications devices.

2. Hub-initiated LPM.  After the link has been idle for a specific
amount of time, the parent hub will request that the child go into a
lower power state.  The child can refuse that request.  For example, a
USB modem may want to refuse the LPM request if it is in the middle of
receiving a text message.  Hub-initiated LPM is good for devices where
only the host initiates the data transfer, like USB printers or USB mass
storage devices.

Links will be automatically placed into higher power states by the USB
hubs and roothubs whenever the host starts a USB transmission.

Introduce a new usb_driver flag, disable_hub_initiated_lpm, that allows
drivers to disable hub-initiated LPM.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Hansjoerg Lipp <hjlipp@web.de>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Oliver Neukum <oliver@neukum.name>
Cc: Peter Korsgaard <jacmet@sunsite.dk>
Cc: Jan Dumon <j.dumon@option.com>
Cc: Petko Manolov <petkan@users.sourceforge.net>
Cc: Steve Glendinning <steve.glendinning@smsc.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: "Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
Cc: Jouni Malinen <jouni@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Senthil Balasubramanian <senthilb@qca.qualcomm.com>
Cc: Christian Lamparter <chunkeey@googlemail.com>
Cc: Brett Rudley <brudley@broadcom.com>
Cc: Roland Vossen <rvossen@broadcom.com>
Cc: Arend van Spriel <arend@broadcom.com>
Cc: "Franky (Zhenhui) Lin" <frankyl@broadcom.com>
Cc: Kan Yan <kanyan@broadcom.com>
Cc: Dan Williams <dcbw@redhat.com>
Cc: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Cc: Ivo van Doorn <IvDoorn@gmail.com>
Cc: Gertjan van Wingerde <gwingerde@gmail.com>
Cc: Helmut Schaa <helmut.schaa@googlemail.com>
Cc: Herton Ronaldo Krzesinski <herton@canonical.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Chaoming Li <chaoming_li@realsil.com.cn>
Cc: Daniel Drake <dsd@gentoo.org>
Cc: Ulrich Kunitz <kune@deine-taler.de>
Cc: linux-bluetooth@vger.kernel.org
Cc: gigaset307x-common@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: ath9k-devel@lists.ath9k.org
Cc: libertas-dev@lists.infradead.org
Cc: users@rt2x00.serialmonkey.com
2012-05-18 15:41:57 -07:00
Sarah Sharp
51e0a01206 USB: Calculate USB 3.0 exit latencies for LPM.
There are several different exit latencies associated with coming out of
the U1 or U2 lower power link state.

Device Exit Latency (DEL) is the maximum time it takes for the USB
device to bring its upstream link into U0.  That can be found in the
SuperSpeed Extended Capabilities BOS descriptor for the device.  The
time it takes for a particular link in the tree to exit to U0 is the
maximum of either the parent hub's U1/U2 DEL, or the child's U1/U2 DEL.

Hubs introduce a further delay that effects how long it takes a child
device to transition to U0.  When a USB 3.0 hub receives a header
packet, it takes some time to decode that header and figure out which
downstream port the packet was destined for.  If the port is not in U0,
this hub header decode latency will cause an additional delay for
bringing the child device to U0.  This Hub Header Decode Latency is
found in the USB 3.0 hub descriptor.

We can use DEL and the header decode latency, along with additional
latencies imposed by each additional hub tier, to figure out the exit
latencies for both host-initiated and device-initiated exit to U0.

The Max Exit Latency (MEL) is the worst-case time it will take for a
host-initiated exit to U0, based on whether U1 or U2 link states are
enabled.  The ping or packet must traverse the path to the device, and
each hub along the way incurs the hub header decode latency in order to
figure out which device the transfer was bound for.  We say worst-case,
because some hubs may not be in the lowest link state that is enabled.
See the examples in section C.2.2.1.

Note that "HSD" is a "host specific delay" that the power appendix
architect has not been able to tell me how to calculate.  There's no way
to get HSD from the xHCI registers either, so I'm simply ignoring it.

The Path Exit Latency (PEL) is the worst-case time it will take for a
device-initiate exit to U0 to place all the links from the device to the
host into U0.

The System Exit Latency (SEL) is another device-initiated exit latency.
SEL is useful for USB 3.0 devices that need to send data to the host at
specific intervals.  The device may send an NRDY to indicate it isn't
ready to send data, then put its link into a lower power state.  If it
needs to have that data transmitted at a specific time, it can use SEL
to back calculate when it will need to bring the link back into U0 to
meet its deadlines.

SEL is the worst-case time from the device-initiated exit to U0, to when
the device will receive a packet from the host controller.  It includes
PEL, the time it takes for an ERDY to get to the host, a host-specific
delay for the host to process that ERDY, and the time it takes for the
packet to traverse the path to the device.  See Figure C-2 in the USB
3.0 bus specification.

Note: I have not been able to get good answers about what the
host-specific delay to process the ERDY should be.  The Intel HW
developers say it will be specific to the platform the xHCI host is
integrated into, and they say it's negligible.  Ignore this too.

Separate from these four exit latencies are the U1/U2 timeout values we
program into the parent hubs.  These timeouts tell the hub to attempt to
place the device into a lower power link state after the link has been
idle for that amount of time.

Create two arrays (one for U1 and one for U2) to store mel, pel, sel,
and the timeout values.  Store the exit latency values in nanosecond
units, since that's the smallest units used (DEL is in us, but the Hub
Header Decode Latency is in ns).

If a USB 3.0 device doesn't have a SuperSpeed Extended Capabilities BOS
descriptor, it's highly unlikely it will be able to handle LPM requests
properly.  So it's best to disable LPM for devices that don't have this
descriptor, and any children beneath it, if it's a USB 3.0 hub.  Warn
users when that happens, since it means they have a non-compliant USB
3.0 device or hub.

This patch assumes a simplified design where links deep in the tree will
not have U1 or U2 enabled unless all their parent links have the
corresponding LPM state enabled.  Eventually, we might want to allow a
different policy, and we can revisit this patch when that happens.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
2012-05-18 15:41:56 -07:00
Sarah Sharp
d9b2099cd6 USB: Refactor code to set LPM support flag.
Refactor the code that sets the usb_device flag to indicate the device
support link power management (lpm_capable).  The current code sets
lpm_capable unconditionally if the USB devices have a USB 2.0 Extended
Capabilities Descriptor.  USB 3.0 devices can also have that descriptor,
but the xHCI driver code that uses lpm_capable will not run the USB 2.0
LPM test for devices under the USB 3.0 roothub.  Therefore, it's fine
only set lpm_capable for high speed devices in this refactoring.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2012-05-18 15:41:54 -07:00
Sarah Sharp
448b6eb1e0 USB: Make sure to fetch the BOS desc for roothubs.
The BOS descriptor is normally fetched and stored in the usb_device->bos
during enumeration.  USB 3.0 roothubs don't undergo enumeration, but we
need them to have a BOS descriptor, since each xHCI host has a different
U1 and U2 exit latency.  Make sure to fetch the BOS descriptor for USB
3.0 roothubs.  It will be freed when the roothub usb_device is released.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Andiry Xu <andiry.xu@amd.com>
2012-05-18 15:41:53 -07:00
Sarah Sharp
797b0ca5e6 xhci: Add roothub code to set U1/U2 timeouts.
USB 3.0 hubs can be put into a mode where the hub can automatically
request that the link go into a deeper link power state after the link
has been idle for a specified amount of time.  Each of the new USB 3.0
link states (U1 and U2) have their own timeout that can be programmed
per port.

Change the xHCI roothub emulation code to handle the request to set the
U1 and U2 timeouts.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2012-05-18 15:41:52 -07:00
Sarah Sharp
33b2831ac8 xhci: Reset reserved command ring TRBs on cleanup.
When the xHCI driver needs to clean up memory (perhaps due to a failed
register restore on resume from S3 or resume from S4), it needs to reset
the number of reserved TRBs on the command ring to zero.  Otherwise,
several resume cycles (about 30) with a UAS device attached will
continually increment the number of reserved TRBs, until all command
submissions fail because there isn't enough room on the command ring.

This patch should be backported to kernels as old as 2.6.32,
that contain the commit 913a8a344f
"USB: xhci: Change how xHCI commands are handled."

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable@vger.kernel.org
2012-05-18 15:41:51 -07:00
Oliver Neukum
f8a9e72d12 USB: fix resource leak in xhci power loss path
Some more data structures must be freed and counters
reset if an XHCI controller has lost power. The failure
to do so renders some chips inoperative after a certain number
of S4 cycles.

This patch should be backported to kernels as old as 3.2,
that contain the commits c29eea6219
"xhci: Implement HS/FS/LS bandwidth checking." and
commit 839c817ce6
"xhci: Implement HS/FS/LS bandwidth checking."

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable@vger.kernel.org
2012-05-18 15:41:39 -07:00
Linus Torvalds
30a08bf2d3 proc: move fd symlink i_mode calculations into tid_fd_revalidate()
Instead of doing the i_mode calculations at proc_fd_instantiate() time,
move them into tid_fd_revalidate(), which is where the other inode state
(notably uid/gid information) is updated too.

Otherwise we'll end up with stale i_mode information if an fd is re-used
while the dentry still hangs around.  Not that anything really *cares*
(symlink permissions don't really matter), but Tetsuo Handa noticed that
the owner read/write bits don't always match the state of the
readability of the file descriptor, and we _used_ to get this right a
long time ago in a galaxy far, far away.

Besides, aside from fixing an ugly detail (that has apparently been this
way since commit 61a2878402: "proc: Remove the hard coded inode
numbers" in 2006), this removes more lines of code than it adds.  And it
just makes sense to update i_mode in the same place we update i_uid/gid.

Al Viro correctly points out that we could just do the inode fill in the
inode iops ->getattr() function instead.  However, that does require
somewhat slightly more invasive changes, and adds yet *another* lookup
of the file descriptor.  We need to do the revalidate() for other
reasons anyway, and have the file descriptor handy, so we might as well
fill in the information at this point.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-18 14:06:17 -07:00
Vipul Pandya
67bbc05512 RDMA/cxgb4: Add query_qp support
This allows querying the QP state before flushing.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:37 -07:00
Vipul Pandya
ec3eead217 RDMA/cxgb4: Remove kfifo usage
Using kfifos for ID management was limiting the number of QPs and
preventing NP384 MPI jobs.  So replace it with a simple bitmap
allocator.

Remove IDs from the IDR tables before deallocating them.  This bug was
causing the BUG_ON() in insert_handle() to fire because the ID was
getting reused before being removed from the IDR table.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:36 -07:00
Vipul Pandya
d716a2a014 RDMA/cxgb4: Use vmalloc() for debugfs QP dump
This allows dumping thousands of QPs.  Log active open failures of
interest.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:35 -07:00
Vipul Pandya
422eea0a8c RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
Add module option db_fc_threshold which is the count of active QPs
that trigger automatic db flow control mode.  Automatically transition
to/from flow control mode when the active qp count crosses
db_fc_theshold.

Add more db debugfs stats

On DB DROP event from the LLD, recover all the iwarp queues.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:33 -07:00
Vipul Pandya
4984037bef RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
Use GFP_ATOMIC in _insert_handle() if ints are disabled.

Don't panic if we get an abort with no endpoint found.  Just log a
warning.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:32 -07:00
Vipul Pandya
2c97478106 RDMA/cxgb4: Add DB Overflow Avoidance
Get FULL/EMPTY/DROP events from LLD.  On FULL event, disable normal
user mode DB rings.

Add modify_qp semantics to allow user processes to call into the
kernel to ring doobells without overflowing.

Add DB Full/Empty/Drop stats.

Mark queues when created indicating the doorbell state.

If we're in the middle of db overflow avoidance, then newly created
queues should start out in this mode.

Bump the C4IW_UVERBS_ABI_VERSION to 2 so the user mode library can
know if the driver supports the kernel mode db ringing.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:31 -07:00
Vipul Pandya
8d81ef34b2 RDMA/cxgb4: Add debugfs RDMA memory stats
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:29 -07:00
Vipul Pandya
3069ee9bc4 cxgb4: DB Drop Recovery for RDMA and LLD queues
recover LLD EQs for DB drop interrupts.  This includes adding a new
db_lock, a spin lock disabling BH too, used by the recovery thread and
the ring_tx_db() paths to allow db drop recovery.

Clean up initial DB avoidance code.

Add read_eq_indices() - this allows the LLD to use the PCIe mw to
efficiently read hw eq contexts.

Add cxgb4_sync_txq_pidx() - called by iw_cxgb4 to sync up the sw/hw
pidx value.

Add flush_eq_cache() and cxgb4_flush_eq_cache().  This allows iw_cxgb4
to flush the sge eq context cache before beginning db drop recovery.

Add module parameter, dbfoifo_int_thresh, to allow tuning the db
interrupt threshold value.

Add dbfifo_int_thresh to cxgb4_lld_info so iw_cxgb4 knows the threshold.

Add module parameter, dbfoifo_drain_delay, to allow tuning the amount
of time delay between DB FULL and EMPTY upcalls to iw_cxgb4.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:28 -07:00
Vipul Pandya
8caa1e8446 cxgb4: Common platform specific changes for DB Drop Recovery
Add platform-specific callback functions for interrupts.  This is
needed to do a single read-clear of the CAUSE register and then call
out to platform specific functions for DB threshold interrupts and DB
drop interrupts.

Add t4_mem_win_read_len() - mem-window reads for arbitrary lengths.
This is used to read the CIDX/PIDX values from EC contexts during DB
drop recovery.

Add t4_fwaddrspace_write() - sends addrspace write cmds to the fw.
Needed to flush the sge eq context cache.

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:27 -07:00
Vipul Pandya
881806bc15 cxgb4: Detect DB FULL events and notify RDMA ULD
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 13:22:25 -07:00
Steven Rostedt
683a3e6481 ktest: Fix kernelrevision with POST_BUILD
The PRE_BUILD and POST_BUILD options of ktest are added to allow the
user to add temporary patch to the system and remove it on builds. This
is sometimes use to take a change from another git branch and add it to
a series without the fix so that this series can be tested, when an
unrelated bug exists in the series.

The problem comes when a tagged commit is being used. For example, if
v3.2 is being tested, and we add a patch to it, the kernelrelease for
that commit will be 3.2.0+, but without the patch the version will be
3.2.0. This can cause problems when the kernelrelease is determined for
creating the /lib/modules directory. The kernel booting has the '+' but
the module directory will not, and the modules will be missing for that
boot, and may not allow the kernel to succeed.

The fix is to put the creation of the kernelrelease in the POST_BUILD
logic, before it applies the POST_BUILD operation. The POST_BUILD is
where the patch may be removed, removing the '+' from the kernelrelease.

The calculation of the kernelrelease will also stay in its current
location but will be ignored if it was already calculated previously.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-18 14:27:51 -04:00
John Johansen
cffee16e8b apparmor: fix long path failure due to disconnected path
BugLink: http://bugs.launchpad.net/bugs/955892

All failures from __d_path where being treated as disconnected paths,
however __d_path can also fail when the generated pathname is too long.

The initial ENAMETOOLONG error was being lost, and ENAMETOOLONG was only
returned if the subsequent dentry_path call resulted in that error.  Other
wise if the path was split across a mount point such that the dentry_path
fit within the buffer when the __d_path did not the failure was treated
as a disconnected path.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-05-18 11:09:52 -07:00
John Johansen
bf83208e0b apparmor: fix profile lookup for unconfined
BugLink: http://bugs.launchpad.net/bugs/978038

also affects apparmor portion of
BugLink: http://bugs.launchpad.net/bugs/987371

The unconfined profile is not stored in the regular profile list, but
change_profile and exec transitions may want access to it when setting
up specialized transitions like switch to the unconfined profile of a
new policy namespace.

Signed-off-by: John Johansen <john.johansen@canonical.com>
2012-05-18 11:09:28 -07:00
Eric Dumazet
d4b1133558 pktgen: fix module unload for good
commit c57b546840 (pktgen: fix crash at module unload) did a very poor
job with list primitives.

1) list_splice() arguments were in the wrong order

2) list_splice(list, head) has undefined behavior if head is not
initialized.

3) We should use the list_splice_init() variant to clear pktgen_threads
list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 13:54:33 -04:00
Somnath Kotur
941a77d582 be2net: Fix to allow get/set of debug levels in the firmware.
Patch re-spin.
Incorporated review comments by Ben Hutchings.

Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 13:33:32 -04:00
Chris Metcalf
e6d9668e11 tilegx: enable SYSCALL_WRAPPERS support
Some discussion with the glibc mailing lists revealed that this was
necessary for 64-bit platforms with MIPS-like sign-extension rules
for 32-bit values.  The original symptom was that passing (uid_t)-1 to
setreuid() was failing in programs linked -pthread because of the "setxid"
mechanism for passing setxid-type function arguments to the syscall code.
SYSCALL_WRAPPERS handles ensuring that all syscall arguments end up with
proper sign-extension and is thus the appropriate fix for this problem.

On other platforms (s390, powerpc, sparc64, and mips) this was fixed
in 2.6.28.6.  The general issue is tracked as CVE-2009-0029.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-18 13:33:24 -04:00
Eric Dumazet
d7f7c0ac11 ipv6: remove csummode in ip6_append_data()
csummode variable is always CHECKSUM_NONE in ip6_append_data()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 13:31:25 -04:00
Eric Dumazet
6f532612cc net: introduce netdev_alloc_frag()
Fix two issues introduced in commit a1c7fff7e1
( net: netdev_alloc_skb() use build_skb() )

- Must be IRQ safe (non NAPI drivers can use it)
- Must not leak the frag if build_skb() fails to allocate sk_buff

This patch introduces netdev_alloc_frag() for drivers willing to
use build_skb() instead of __netdev_alloc_skb() variants.

Factorize code so that :
__dev_alloc_skb() is a wrapper around __netdev_alloc_skb(), and
dev_alloc_skb() a wrapper around netdev_alloc_skb()

Use __GFP_COLD flag.

Almost all network drivers now benefit from skb->head_frag
infrastructure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 13:31:25 -04:00
Eric Dumazet
56138f50d1 iwlwifi: dont pull too much payload in skb head
As iwlwifi use fat skbs, it should not pull too much data in skb->head,
and particularly no tcp data payload, or splice() is slower, and TCP
coalescing is disabled. Copying payload to userland also involves at
least two copies (part from header, part from fragment)

Each layer will pull its header from the fragment as needed.

(on 64bit arches, skb_tailroom(skb) at this point is 192 bytes)

With this patch applied, I have a major reduction of collapsed/pruned
TCP packets, a nice increase of TCPRcvCoalesce counter, and overall
better Internet User experience.

Small packets are still using a fragless skb, so that page can be reused
by the driver.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 13:31:25 -04:00
Linus Torvalds
3d9944978e Fix machine check recovery
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPtS5qAAoJEKurIx+X31iB+bAP/1FjUa2Nd53X89HFc6DoLktA
 4AshM/JENhAfSbpTfGhg10ZOuwUa8sQ85Sf6yz1CsW6mEiJK/bDFrR1g2KrmejyL
 owgQvV6ukPABzfB27tWyXSVVBPmLkJviedLDautVpgPEPVuqauntmpe7fW51b5pf
 92MxvYZ6AzgbDIjVaXP7e+kPomvgyM1C/UEvCgoyEcw81h5dchU9NSdXNBS67JS/
 uOsMiMJyoNI46haYYbyFgMq3RmpYuxTLFj7qFDlUltyjP+vIvyLs38Ae/vkRMNfV
 sYXWRUQlRpvqg4MDIFVZx8FWTufzTm0BMS+Be7JkXKWdF3DAksq6FprOWIxfYi+d
 PMxwTFeSJzTINb9n9MiLt3TmuRy3mu37QWd28qJaJciNMkYWbclPqyJmjwsuAMKg
 hKSy2FvewIDHTAGOkwaVjS+L8O7j3TNRIAbweFA1d1K4rt6oSfwdn6GZrAb6MTvx
 oV0Fe1nAyY9mucyjknBTim3RYZ9qQ7H9SjL8JoaGihSzi988MgBun+iEQOQiwjNl
 YJNGIv0wb31qCKFtxU0A8rA0sFdhQRoLgigJfETb5a+2ghtG323oH6ZVWTnB8bp/
 g1XOpus222/iJdL2AizMiJeUNV1IZ0SeLn2GoTFQIy9Uf11Zdgu5wLJumUva8EC6
 JAACbpBlYufftIn278OH
 =tk2M
 -----END PGP SIGNATURE-----

Merge tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull a machine check recovery fix from Tony Luck.

I really don't like how the MCE code does some of the things it does,
but this does seem to be an improvement.

* tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  x86/mce: Only restart instruction after machine check recovery if it is safe
2012-05-18 09:42:20 -07:00
Graeme Gregory
c948ef3ae7 mfd: palmas PMIC device support Kconfig
Add the new palmas MFD to Kconfig and Makefile

Signed-off-by: Graeme Gregory <gg@slimlogic.co.uk>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-05-18 16:54:48 +01:00
Graeme Gregory
2945fbc2fc mfd: palmas PMIC device support
Palmas is a PMIC from Texas Instruments and this is the MFD part of the
driver for this chip. The PMIC has SMPS and LDO regulators, a general
purpose ADC, GPIO, USB OTG mode detection, watchdog and RTC features.

Signed-off-by: Graeme Gregory <gg@slimlogic.co.uk>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-05-18 16:54:47 +01:00
Lee Jones
1bdd670a32 regulator: Enable Device Tree for the db8500-prcmu regulator driver
Here we use the previous regulator register code separated from probe to
register each of the regulators mentioned in Device Tree.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-05-18 16:38:00 +01:00
Lee Jones
8986cf8852 regulator: db8500-prcmu: Separate regulator registration from probe
This will provide us with a convenient way to register regulators when
booting with Device Tree both enabled & disabled and will save us a
great deal of code duplication in time.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-05-18 16:36:36 +01:00
Mark Salter
8ff98b9c99 C6X: remove unused config items
Signed-off-by: Mark Salter <msalter@redhat.com>
2012-05-18 09:59:22 -04:00
Arnd Bergmann
396d81cd0f Merge branch 'at91/dt' into next/dt
This should help the merge with the at91 adc driver that is currently
in the staging tree.

* at91/dt:
  ARM: at91: Add ADC driver to at91sam9260/at91sam9g20 dtsi files

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2012-05-18 15:04:12 +02:00
Nicolas Ferre
73d68d91aa ARM: at91: Add ADC driver to at91sam9260/at91sam9g20 dtsi files
Now that the bulk of at91sam9g20-related nodes are located in at91sam9260.dtsi,
we have to re-create the path to this ADC node for SoC specific parts.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2012-05-18 15:02:56 +02:00
Axel Lin
b13296d070 regulator: ab3100: Use regulator_map_voltage_iterate()
regulator_map_voltage_iterate() is for drivers implementing set_voltage_sel()
and list_voltage() to use it as their map_voltage() operation.

In this case, regulator_map_voltage_iterate() happen to be doing the same thing
as ab3100_get_best_voltage_index() function. So we can use it to replace
ab3100_get_best_voltage_index() function.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-05-18 08:39:57 +01:00
Axel Lin
8b22285fd7 regulator: tps65217: Convert to set_voltage_sel and map_voltage
Convert tps65217_pmic_ops to use set_voltage_sel and map_voltage.
After this convertion, we can also use tps65217_pmic_set_voltage_sel()
for set_voltage_sel callback of tps65217_pmic_ldo1_ops.
Thus this patch also removes tps65217_pmic_ldo1_set_voltage_sel() function.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-05-18 08:38:32 +01:00
Lee Jones
3a8334b948 regulator: Enable the ab8500 for Device Tree
Here we setup the ab8500 regulator driver for DT. We first do
this in the normal way, by providing a match structure during
initialisation, but then we provide information so that
whilst probing we can use existing data structures to do DT
look-ups.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-05-18 08:37:25 +01:00
Lee Jones
a7ac1d9e4e regulator: ab8500: Split up probe() into manageable pieces
ab8500's probe() function is becoming quite large, so in the lead
up to Device Tree enablement which will fork the thread of execution
this patch splits it into 3 main areas; basic error checking will
remain in probe(), but regulator register initialisation and regulator
registration have been moved to their own functions which will
be called in sequence by probe() and the DT equivalent.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2012-05-18 08:36:45 +01:00
Eric Dumazet
92113bfde2 ipv6: bool conversions phase1
ipv6_opt_accepted() returns a bool, and can use const pointers

ipv6_addr_equal(), ipv6_addr_any(), ipv6_addr_loopback(),
ipv6_addr_orchid() return a bool.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 02:24:13 -04:00
Eric Dumazet
cbc264cacd ip_frag: struct inet_frags match() method returns a bool
- match() method returns a boolean
- return (A && B && C && D) -> return A && B && C && D
- fix indentation

Signed-off-by: Eric Dumazet <edumazet@google.com>
2012-05-18 01:40:27 -04:00
David S. Miller
28e85100ae net: Remove netdevice ec_ptr, no longer used.
ECONET is gone, thus this can be deleted as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 01:39:43 -04:00
Stephen Hemminger
349f29d841 econet: remove ancient bug ridden protocol
More spring cleaning!

The ancient Econet protocol should go. Most of the bug fixes in recent
years have been fixing security vulnerabilities. The hardware hasn't
been made since the 90s, it is only interesting as an archeological curiosity.

For the truly curious, or insomniac, go read up on it.
  http://en.wikipedia.org/wiki/Econet

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 01:35:08 -04:00
Paul Gortmaker
93c2d656c7 frv: delete incorrect task prototypes causing compile fail
Commit 41101809a8 ("fork: Provide weak arch_release_[task_struct|
thread_info] functions") in -tip highlights a problem in the frv arch,
where it has needles prototypes for alloc_task_struct_node and
free_task_struct.  This now shows up as:

  kernel/fork.c:120:66: error: static declaration of 'alloc_task_struct_node' follows non-static declaration
  kernel/fork.c:127:51: error: static declaration of 'free_task_struct' follows non-static declaration

since that commit turned them into real functions.  Since arch/frv does
does not define define __HAVE_ARCH_TASK_STRUCT_ALLOCATOR (i.e.  it just
uses the generic ones) it shouldn't list these at all.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
majianpeng
02e1a9cd1e slub: missing test for partial pages flush work in flush_all()
I found some kernel messages such as:

    SLUB raid5-md127: kmem_cache_destroy called for cache that still has objects.
    Pid: 6143, comm: mdadm Tainted: G           O 3.4.0-rc6+        #75
    Call Trace:
    kmem_cache_destroy+0x328/0x400
    free_conf+0x2d/0xf0 [raid456]
    stop+0x41/0x60 [raid456]
    md_stop+0x1a/0x60 [md_mod]
    do_md_stop+0x74/0x470 [md_mod]
    md_ioctl+0xff/0x11f0 [md_mod]
    blkdev_ioctl+0xd8/0x7a0
    block_ioctl+0x3b/0x40
    do_vfs_ioctl+0x96/0x560
    sys_ioctl+0x91/0xa0
    system_call_fastpath+0x16/0x1b

Then using kmemleak I found these messages:

    unreferenced object 0xffff8800b6db7380 (size 112):
      comm "mdadm", pid 5783, jiffies 4294810749 (age 90.589s)
      hex dump (first 32 bytes):
        01 01 db b6 ad 4e ad de ff ff ff ff ff ff ff ff  .....N..........
        ff ff ff ff ff ff ff ff 98 40 4a 82 ff ff ff ff  .........@J.....
      backtrace:
        kmemleak_alloc+0x21/0x50
        kmem_cache_alloc+0xeb/0x1b0
        kmem_cache_open+0x2f1/0x430
        kmem_cache_create+0x158/0x320
        setup_conf+0x649/0x770 [raid456]
        run+0x68b/0x840 [raid456]
        md_run+0x529/0x940 [md_mod]
        do_md_run+0x18/0xc0 [md_mod]
        md_ioctl+0xba8/0x11f0 [md_mod]
        blkdev_ioctl+0xd8/0x7a0
        block_ioctl+0x3b/0x40
        do_vfs_ioctl+0x96/0x560
        sys_ioctl+0x91/0xa0
        system_call_fastpath+0x16/0x1b

This bug was introduced by commit a8364d5555 ("slub: only IPI CPUs that
have per cpu obj to flush"), which did not include checks for per cpu
partial pages being present on a cpu.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
Cyrill Gorcunov
eb94cd96e0 fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
map_files/ entries are never supposed to be executed, still curious
minds might try to run them, which leads to the following deadlock

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.4.0-rc4-24406-g841e6a6 #121 Not tainted
  -------------------------------------------------------
  bash/1556 is trying to acquire lock:
   (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1

  but task is already holding lock:
   (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (&sig->cred_guard_mutex){+.+.+.}:
         validate_chain+0x444/0x4f4
         __lock_acquire+0x387/0x3f8
         lock_acquire+0x12b/0x158
         __mutex_lock_common+0x56/0x3a9
         mutex_lock_killable_nested+0x40/0x45
         lock_trace+0x24/0x59
         proc_map_files_lookup+0x5a/0x165
         __lookup_hash+0x52/0x73
         do_lookup+0x276/0x2b1
         walk_component+0x3d/0x114
         do_last+0xfc/0x540
         path_openat+0xd3/0x306
         do_filp_open+0x3d/0x89
         do_sys_open+0x74/0x106
         sys_open+0x21/0x23
         tracesys+0xdd/0xe2

  -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
         check_prev_add+0x6a/0x1ef
         validate_chain+0x444/0x4f4
         __lock_acquire+0x387/0x3f8
         lock_acquire+0x12b/0x158
         __mutex_lock_common+0x56/0x3a9
         mutex_lock_nested+0x40/0x45
         do_lookup+0x267/0x2b1
         walk_component+0x3d/0x114
         link_path_walk+0x1f9/0x48f
         path_openat+0xb6/0x306
         do_filp_open+0x3d/0x89
         open_exec+0x25/0xa0
         do_execve_common+0xea/0x2f9
         do_execve+0x43/0x45
         sys_execve+0x43/0x5a
         stub_execve+0x6c/0xc0

This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
and when do_lookup happens we try to grab task->signal->cred_guard_mutex
again in lock_trace.

Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
and in proc_map_files_readdir() instead of lock_trace(), the caller must
be CAP_SYS_ADMIN granted anyway.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00