Commit Graph

321314 Commits

Author SHA1 Message Date
Jonas Gorski
19c860d932 MIPS: BCM63XX: Add PCIe Support for BCM6328
Add support for the PCIe port found on BCM6328.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/3956/
Reviewed-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:13 +02:00
Jonas Gorski
76f42fe811 MIPS: BCM63XX: Move the PCI initialization into its own function
Also make the cpu check a bit more explicit.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/3953/
Reviewed-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:13 +02:00
Jonas Gorski
e5766aea5b MIPS: BCM63XX: Add basic BCM6328 support
This includes CPU speed, memory size detection and working UART, but
lacking the appropriate drivers, no support for attached flash.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/3951/
Reviewed-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:12 +02:00
Jonas Gorski
288752a8aa MIPS: BCM63XX: Use the Chip ID register for identifying the SoC
Newer BCM63XX SoCs use virtually the same CPU ID, differing only in the
revision bits. But since they all have the Chip ID register at the same
location, we can use that to identify the SoC we are running on.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/3955/
Reviewed-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:12 +02:00
Jonas Gorski
aaf3fedb56 MIPS: BCM63XX: Add flash type detection
On BCM6358 and BCM6368 the attached flash type is exposed through a
bootstrapping register. Use it for auto detecting the flash type on
those and default to parallel flash for earlier SoCs.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/3954/
Reviewed-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:11 +02:00
Jonas Gorski
4b897d5483 MIPS: BCM63XX: Move flash registration out of board_bcm963xx.c
board_bcm963xx.c is already large enough.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: Kevin Cernekee <cernekee@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/3952/
Reviewed-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:11 +02:00
Florian Fainelli
553072b27e hw_random: add Broadcom BCM63xx RNG driver
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: mpm@selenic.com
Cc: herbert@gondor.apana.org.au
Patchwork: https://patchwork.linux-mips.org/patch/3327/
Patchwork: https://patchwork.linux-mips.org/patch/4072/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:11 +02:00
Florian Fainelli
b73ab84199 MIPS: BCM63XX: add RNG driver platform_device stub
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: mpm@selenic.com
Cc: herbert@gondor.apana.org.au
Patchwork: https://patchwork.linux-mips.org/patch/3325/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:10 +02:00
Florian Fainelli
8aecfe9462 MIPS: BCM63XX: add RNG peripheral definitions
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: mpm@selenic.com
Cc: herbert@gondor.apana.org.au
Patchwork: https://patchwork.linux-mips.org/patch/3326/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:10 +02:00
Florian Fainelli
0b55561bc6 MIPS: BCM63XX: add support for "ipsec" clock
This module is only available on BCM6368 so far and does not require
resetting the block.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Cc: mpm@selenic.com
Cc: herbert@gondor.apana.org.au
Patchwork: https://patchwork.linux-mips.org/patch/3324/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:09 +02:00
David Daney
a03822ea5d MIPS: OCTEON: Remove some unused files.
These FPA related files are not used anywhere in the kernel.  Remove
them.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3892/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:09 +02:00
Florian Fainelli
94c58b7f23 MIPS: BCM63XX: Fix platform_devices id
There is only one watchdog and VoIP DSP platform devices per board, use
-1 as the platform_device id accordingly.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3313/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24 16:33:09 +02:00
Lars Ellenberg
a73ff3231d drbd: announce FLUSH/FUA capability to upper layers
Unconditionally announce FLUSH/FUA to upper layers.
If the lower layers on either node do not actually support this,
generic_make_request() will deal with it.

If this causes performance regressions on your setup,
make sure there are no volatile caches involved,
and mount -o nobarrier or equivalent.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 15:14:28 +02:00
Lars Ellenberg
db141b2f42 drbd: fix max_bio_size to be unsigned
We capped our max_bio_size respectively max_hw_sectors with
min_t(int, lower level limit, our limit);
unfortunately, some drivers, e.g. the kvm virtio block driver, initialize their
limits to "-1U", and that is of course a smaller "int" value than our limit.

Impact: we started to request 16 MB resync requests,
which lead to protocol error and a reconnect loop.

Fix all relevant constants and parameters to be unsigned int.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 15:14:00 +02:00
Lars Ellenberg
7ee1fb93f3 drbd: flush drbd work queue before invalidate/invalidate remote
If you do back to back wait-sync/invalidate on a Primary in a tight loop,
during application IO load, you could trigger a race:
  kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync'
	but 'write from resync_finished' still pending?

Fix this by changing the order of the drbd_queue_work() and
the wake_up() in dec_ap_pending(), and adding the additional
drbd_flush_workqueue() before requesting the full sync.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:15:58 +02:00
Lars Ellenberg
c12e9c8964 drbd: fix potential access after free
Occasionally, if we disconnect, we triggered this assert:
  block drbd7: ASSERT FAILED tl_hash[27] == c30b0f04, expected NULL

hlist_del() happens only on master bio completion.

We used to wait for pending IO to complete before freeing tl_hash
on disconnect. We no longer do so, since we learned to "freeze"
IO on disconnect.

If the local disk is too slow, we may reach C_STANDALONE early,
and there are still some requests pending locally when we call
drbd_free_tl_hash().

If we now free the tl_hash, and later the local IO completion completes
the master bio, which then does hlist_del() and clobbers freed memory.

Do hlist_del_init() and hlist_add_fake() before kfree(tl_hash),
so the hlist_del() on master bio completion is harmless.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:15:16 +02:00
Laurent Pinchart
fb604a3d58 i2c-omap: Add support for I2C_M_STOP message flag
Generate a stop condition after each message marked with I2C_M_STOP.

[JD: Add I2C_FUNC_PROTOCOL_MANGLING.]

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:59 +02:00
Laurent Pinchart
72fc2c7f78 i2c: Fall back to emulated SMBus if the operation isn't supported natively
Adapter drivers might support only a subset of the SMBus operations
natively. Those drivers currently have to manually emulate unsupported
operations using I2C.

Make the i2c_smbus_xfer() function fall back to
i2c_smbus_xfer_emulated() when the adapter's .smbus_xfer() operation
returns -EOPNOTSUPP, like it already does when the .smbus_xfer()
operation isn't available at all.

[JD: Minor optimization.]

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:59 +02:00
Laurent Pinchart
d47726c521 i2c: Add SCCB support
SCCB is a serial communication bus developed by Omnivision. Its 2-wire
mode is very similar to SMBus byte data transactions, but requires the
controller to ignore the ACK bit and to insert a stop condition after
each message.

Add a device SCCB flag and a message stop flag to be passed to
controller drivers.

[JD: Kill rogue definition in go7007 driver.]

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:59 +02:00
Emmanuel Deloget
68a7602f09 i2c-tiny-usb: Add support for the Robofuzz OSIF USB/I2C converter
Robofuzz OSIF is a generic USB/iIC interface that embeds an ATMega8A
AVR-RISC microcontroler.

The device is based upon Till Harbaum's i2c-tiny-usb and although it
enhances the original design with further functionnalities it still
maintain compatibility with it with respect to the USB/I2C interface.

Signed-off-by: Emmanuel Deloget <logout@free.fr>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:59 +02:00
Daniel Kurtz
d3ff6ce400 i2c-i801: Enable IRQ for byte_by_byte transactions
Byte-by-byte transactions are used primarily for accessing I2C devices
with an SMBus controller.  For these transactions, for each byte that is
read or written, the SMBus controller generates a BYTE_DONE IRQ.  The isr
reads/writes the next byte, and clears the IRQ flag to start the next byte.
On the penultimate IRQ, the isr also sets the LAST_BYTE flag.

There is no locking around the cmd/len/count/data variables, since the
I2C adapter lock ensures there is never multiple simultaneous transactions
for the same device, and the driver thread never accesses these variables
while interrupts might be occurring.

The end result is faster I2C block read and write transactions.

Note: This patch has only been tested and verified by doing I2C read and
write block transfers on Cougar Point 6 Series PCH, as well as I2C read
block transfers on ICH5.

Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:59 +02:00
Jean Delvare
29b608540b i2c-i801: Enable interrupts on ICH5/7/8/9/10
Enable interrupts on more devices. ICH5, ICH7(-M) and ICH10 have been
tested to work OK. ICH8 and ICH9 are expected to work just fine as
they are very close to ICH7 and ICH10.

Ultimately we want to enable this feature on at least every device
since the ICH5, but for now we limit the exposure. We'll enable it for
other devices if we don't get negative feedback.

As a bonus, let the user know when interrupts are used.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Daniel Kurtz <djkurtz@chromium.org>
2012-07-24 14:13:59 +02:00
Daniel Kurtz
636752bcb5 i2c-i801: Enable IRQ for SMBus transactions
Add a new 'feature' to i2c-i801 to enable using PCI interrupts.
When the feature is enabled, then an isr is installed for the device's
PCI IRQ.

An I2C/SMBus transaction is always terminated by one of the following
interrupt sources: FAILED, BUS_ERR, DEV_ERR, or on success: INTR.

When the isr fires for one of these cases, it sets the ->status variable
and wakes up the waitq.  The waitq then saves off the status code, and
clears ->status (in preparation for some future transaction).
The SMBus controller generates an INTR irq at the end of each
transaction where INTREN was set in the HST_CNT register.

No locking is needed around accesses to priv->status since all writes to
it are serialized: it is only ever set once in the isr at the end of a
transaction, and cleared while no interrupts can occur.  In addition, the
I2C adapter lock guarantees that entire I2C transactions for a single
adapter are always serialized.

For this patch, the INTREN bit is set only for SMBus block, byte and word
transactions, but not for I2C reads or writes.  The use of the DS
(BYTE_DONE) interrupt with byte-by-byte I2C transactions is implemented in
a subsequent patch.

The interrupt feature has only been enabled for COUGARPOINT hardware.
In addition, it is disabled if SMBus is using the SMI# interrupt.

Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:58 +02:00
Jean Delvare
6cad93c4bb i2c-i801: Consolidate polling
(Based on earlier work by Daniel Kurtz.)

Come up with a consistent, driver-wide strategy for event polling. For
intermediate steps of byte-by-byte block transactions, check for
BYTE_DONE or any error flag being set. At the end of every transaction
(regardless of PEC being used), check for both BUSY being cleared and
INTR or any error flag being set. This ensures proper action for all
transaction types.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Daniel Kurtz <djkurtz@chromium.org>
2012-07-24 14:13:58 +02:00
Daniel Kurtz
37af871112 i2c-i801: Drop ENABLE_INT9
Later patches enable interrupts.  This preliminary patch removes the older
unsupported ENABLE_INT9 flag.

Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:58 +02:00
Daniel Kurtz
edbeea6383 i2c-i801: Rename some SMBHSTCNT bit constants
Rename the SMBHSTCNT register bit access constants to match the style of
other register bits.

Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:58 +02:00
Daniel Kurtz
70a1cc1952 i2c-i801: Check and return errors during byte-by-byte transfers
If an error is detected in the polling loop, abort the transaction and
return an error code.

 * DEV_ERR is set if the device does not respond with an acknowledge, and
the SMBus controller times out (minimum 25ms).
 * BUS_ERR is set if a bus arbitration collision is detected.  In other
words, when the SMBus controller tries to generate a START condition, but
detects that the SMBDATA is being held low, usually by another SMBus/I2C
master.
 * FAILED is only set if a transaction is stopped by software (using
the SMBHSTCNT KILL bit).

Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:58 +02:00
Daniel Kurtz
0ba8b8bfd5 i2c-i801: Clear only status bits in HST_STS
Writing back the whole status register could clear unwanted bits.
In particular, it could clear the "INUSE_STS" bit, which is a
'hardware semaphore', that might be useful to use some day.
To prepare for this, let's ban writing back the whole status to register
HST_STS, of which this is the only instance.

Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:57 +02:00
Daniel Kurtz
efa3cb15ad i2c-i801: Refactor use of LAST_BYTE in i801_block_transaction_byte_by_byte
As a slight optimization, pull some logic out of the polling loop during
byte-by-byte transactions by just setting the I801_LAST_BYTE bit, as
defined in the i801 (PCH) datasheet, when reading the last byte of a
byte-by-byte I2C_SMBUS_READ.

Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:57 +02:00
Fabio Estevam
fda2f4af37 i2c-smbus: Use module_i2c_driver()
Using module_i2c_driver() makes the code smaller and cleaner.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:57 +02:00
Jean Delvare
9cd3f2e849 i2c/writing-clients: Mention module_i2c_driver()
Based on a previous patch from Peter Meerwald.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Peter Meerwald <p.meerwald@bct-electronic.com>
2012-07-24 14:13:57 +02:00
Andrew Armenia
2a2f7404a1 i2c-piix4: Support AMD auxiliary SMBus controller
Some AMD chipsets, such as the SP5100, have an auxiliary SMBus
controller with a second set of registers. This patch adds
support for this auxiliary controller.

Tested on ASUS KCMA-D8 motherboard.

Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:57 +02:00
Andrew Armenia
e154bf6fbf i2c-piix4: Separate registration and probing code
Some chipsets have multiple sets of SMBus registers each controlling a
separate SMBus. Supporting these chipsets properly will require registering
multiple I2C adapters for one piix4.

The code to initialize and register the i2c_adapter structure has been
separated from piix4_probe and allows registration of a piix4 adapter
given its base address. Note that the i2c_adapter and i2c_piix4_adapdata
structures are now dynamically allocated.

Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:56 +02:00
Andrew Armenia
14a8086d27 i2c-piix4: Eliminate piix4_smba global variable
Some chipsets have multiple sets of piix4-compatible SMBus registers.
Eliminating the global variable will allow these chipsets to be fully
supported.

Return value from piix4_setup and piix4_sb800_setup now returns the smba
value detected. This is stored in a struct i2c_piix4_adapdata. Thus
the global variable is eliminated.

Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:56 +02:00
Axel Lin
56f2178898 i2c/busses: Use module_pci_driver
Convert the drivers in drivers/i2c/busses/* to usemodule_pci_driver()
macro which makes the code smaller and a bit simpler.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Olof Johansson <olof@lixom.net>
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Cc: Tomoya MORINAGA <tomoya.rohm@gmail.com>
2012-07-24 14:13:56 +02:00
Guenter Roeck
83a638df36 i2c: Update Guenter Roeck's e-mail address
My old e-mail address won't be valid for much longer. Time to update it.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24 14:13:56 +02:00
Lars Ellenberg
63a6d0bb3d drbd: call local-io-error handler early
In case we want to hard-reset from the local-io-error handler,
we need to call it before notifying the peer or aborting local IO.
Otherwise the peer will advance its data generation UUIDs even
if secondary.

This way, local io error looks like a "regular" node crash,
which reduces the number of different failure cases.
This may be useful in a bigger picture where crashed or otherwise
"misbehaving" nodes are automatically re-deployed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:10:41 +02:00
Lars Ellenberg
0029d62434 drbd: do not reset rs_pending_cnt too early
Fix asserts like
  block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 !

We reset the resync lru cache and related information (rs_pending_cnt),
once we successfully finished a resync or online verify, or if the
replication connection is lost.

We also need to reset it if a resync or online verify is aborted
because a lower level disk failed.

In that case the replication link is still established,
and we may still have packets queued in the network buffers
which want to touch rs_pending_cnt.

We do not have any synchronization mechanism to know for sure when all
such pending resync related packets have been drained.

To avoid this counter to go negative (and violate the ASSERT that it
will always be >= 0), just do not reset it when we lose a disk.

It is good enough to make sure it is re-initialized before the next
resync can start: reset it when we re-attach a disk.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:09:53 +02:00
Lars Ellenberg
88437879fb drbd: reset congestion information before reporting it in /proc/drbd
We cache the congestion status in mdev->congestion_reason whenever
drbd_congested() was called.
Reset this cached info before reporting it when reading /proc/drbd.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:07:48 +02:00
Lars Ellenberg
c2ba686f35 drbd: report congestion if we are waiting for some userland callback
If the drbd worker thread is synchronously waiting for some userland
callback, we don't want some casual pageout to block on us.
Have drbd_congested() report congestion in that case.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:07:18 +02:00
Lars Ellenberg
383606e0de drbd: differentiate between normal and forced detach
Aborting local requests (not waiting for completion from the lower level
disk) is dangerous: if the master bio has been completed to upper
layers, data pages may be re-used for other things already.
If local IO is still pending and later completes,
this may cause crashes or corrupt unrelated data.

Only abort local IO if explicitly requested.
Intended use case is a lower level device that turned into a tarpit,
not completing io requests, not even doing error completion.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:06:18 +02:00
Lars Ellenberg
d264580145 drbd: cleanup, remove two unused global flags
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24 14:02:41 +02:00
Peter Zijlstra
8323f26ce3 sched: Fix race in task_group()
Stefan reported a crash on a kernel before a3e5d1091c ("sched:
Don't call task_group() too many times in set_task_rq()"), he
found the reason to be that the multiple task_group()
invocations in set_task_rq() returned different values.

Looking at all that I found a lack of serialization and plain
wrong comments.

The below tries to fix it using an extra pointer which is
updated under the appropriate scheduler locks. Its not pretty,
but I can't really see another way given how all the cgroup
stuff works.

Reported-and-tested-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:58:20 +02:00
Srivatsa Vaddagiri
88b8dac0a1 sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task
Current load balance scheme requires only one cpu in a
sched_group (balance_cpu) to look at other peer sched_groups for
imbalance and pull tasks towards itself from a busy cpu. Tasks
thus pulled by balance_cpu could later get picked up by cpus
that are in the same sched_group as that of balance_cpu.

This scheme however fails to pull tasks that are not allowed to
run on balance_cpu (but are allowed to run on other cpus in its
sched_group). That can affect fairness and in some worst case
scenarios cause starvation.

Consider a two core (2 threads/core) system running tasks as
below:

          Core0            Core1
         /     \          /     \
	C0     C1	 C2     C3
        |      |         |      |
        v      v         v      v
	F0     T1        F1     [idle]
			 T2

 F0 = SCHED_FIFO task (pinned to C0)
 F1 = SCHED_FIFO task (pinned to C2)
 T1 = SCHED_OTHER task (pinned to C1)
 T2 = SCHED_OTHER task (pinned to C1 and C2)

F1 could become a cpu hog, which will starve T2 unless C1 pulls
it. Between C0 and C1 however, C0 is required to look for
imbalance between cores, which will fail to pull T2 towards
Core0. T2 will starve eternally in this case. The same scenario
can arise in presence of non-rt tasks as well (say we replace F1
with high irq load).

We tackle this problem by having balance_cpu move pinned tasks
to one of its sibling cpus (where they can run). We first check
if load balance goal can be met by ignoring pinned tasks,
failing which we retry move_tasks() with a new env->dst_cpu.

This patch modifies load balance semantics on who can move load
towards a given cpu in a given sched_domain.

Before this patch, a given_cpu or a ilb_cpu acting on behalf of
an idle given_cpu is responsible for moving load to given_cpu.

With this patch applied, balance_cpu can in addition decide on
moving some load to a given_cpu.

There is a remote possibility that excess load could get moved
as a result of this (balance_cpu and given_cpu/ilb_cpu deciding
*independently* and at *same* time to move some load to a
given_cpu). However we should see less of such conflicting
decisions in practice and moreover subsequent load balance
cycles should correct the excess load moved to given_cpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06CDB.2060605@linux.vnet.ibm.com
[ minor edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:58:06 +02:00
Prashanth Nageshappa
bbf18b1949 sched: Reset loop counters if all tasks are pinned and we need to redo load balance
While load balancing, if all tasks on the source runqueue are pinned,
we retry after excluding the corresponding source cpu. However, loop counters
env.loop and env.loop_break are not reset before retrying, which can lead
to failure in moving the tasks. In this patch we reset env.loop and
env.loop_break to their inital values before we retry.

Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06EEF.2090709@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:55:37 +02:00
Prashanth Nageshappa
85c1e7dae1 sched: Reorder 'struct lb_env' members to reduce its size
Members of 'struct lb_env' are not in appropriate order to reuse compiler
added padding on 64bit architectures. In this patch we reorder those struct
members and help reduce the size of the structure from 96 bytes to 80
bytes on 64 bit architectures.

Suggested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06DDE.7000403@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:55:20 +02:00
Mike Galbraith
970e178985 sched: Improve scalability via 'CPU buddies', which withstand random perturbations
Traversing an entire package is not only expensive, it also leads to tasks
bouncing all over a partially idle and possible quite large package.  Fix
that up by assigning a 'buddy' CPU to try to motivate.  Each buddy may try
to motivate that one other CPU, if it's busy, tough, it may then try its
SMT sibling, but that's all this optimization is allowed to cost.

Sibling cache buddies are cross-wired to prevent bouncing.

4 socket 40 core + SMT Westmere box, single 30 sec tbench runs, higher is better:

 clients     1       2       4        8       16       32       64      128
 ..........................................................................
 pre        30      41     118      645     3769     6214    12233    14312
 post      299     603    1211     2418     4697     6847    11606    14557

A nice increase in performance.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339471112.7352.32.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:53:34 +02:00
Srivatsa S. Bhat
a1cd2b13f7 cpusets: Remove/update outdated comments
cpuset_track_online_cpus() is no longer present. So remove the
outdated comment and replace it with reference to cpuset_update_active_cpus()
which is its equivalent.

Also, we don't lack memory hot-unplug anymore. And David Rientjes pointed
out how it is dealt with. So update that comment as well.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141700.3692.98192.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:53:28 +02:00
Srivatsa S. Bhat
7ddf96b02f cpusets, hotplug: Restructure functions that are invoked during hotplug
Separate out the cpuset related handling for CPU/Memory online/offline.
This also helps us exploit the most obvious and basic level of optimization
that any notification mechanism (CPU/Mem online/offline) has to offer us:
"We *know* why we have been invoked. So stop pretending that we are lost,
and do only the necessary amount of processing!".

And while at it, rename scan_for_empty_cpusets() to
scan_cpusets_upon_hotplug(), which is more appropriate considering how
it is restructured.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141650.3692.48637.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:53:22 +02:00
Srivatsa S. Bhat
80d1fa6463 cpusets, hotplug: Implement cpuset tree traversal in a helper function
At present, the functions that deal with cpusets during CPU/Mem hotplug
are quite messy, since a lot of the functionality is mixed up without clear
separation. And this takes a toll on optimization as well. For example,
the function cpuset_update_active_cpus() is called on both CPU offline and CPU
online events; and it invokes scan_for_empty_cpusets(), which makes sense
only for CPU offline events. And hence, the current code ends up unnecessarily
traversing the cpuset tree during CPU online also.

As a first step towards cleaning up those functions, encapsulate the cpuset
tree traversal in a helper function, so as to facilitate upcoming changes.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141635.3692.893.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:53:18 +02:00