Commit Graph

400129 Commits

Author SHA1 Message Date
Michal Hocko
9809b18fcf watchdog: update watchdog_thresh properly
watchdog_tresh controls how often nmi perf event counter checks per-cpu
hrtimer_interrupts counter and blows up if the counter hasn't changed
since the last check.  The counter is updated by per-cpu
watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
period which guarantees that hrtimer is scheduled 2 times per the main
period.  Both hrtimer and perf event are started together when the
watchdog is enabled.

So far so good.  But...

But what happens when watchdog_thresh is updated from sysctl handler?

proc_dowatchdog will set a new sampling period and hrtimer callback
(watchdog_timer_fn) will use the new value in the next round.  The
problem, however, is that nobody tells the perf event that the sampling
period has changed so it is ticking with the period configured when it
has been set up.

This might result in an ear ripping dissonance between perf and hrtimer
parts if the watchdog_thresh is increased.  And even worse it might lead
to KABOOM if the watchdog is configured to panic on such a spurious
lockup.

This patch fixes the issue by updating both nmi perf even counter and
hrtimers if the threshold value has changed.

The nmi one is disabled and then reinitialized from scratch.  This has
an unpleasant side effect that the allocation of the new event might
fail theoretically so the hard lockup detector would be disabled for
such cpus.  On the other hand such a memory allocation failure is very
unlikely because the original event is deallocated right before.

It would be much nicer if we just changed perf event period but there
doesn't seem to be any API to do that right now.  It is also unfortunate
that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
cannot use on_each_cpu() and do the same thing from the per-cpu context.
The update from the current CPU should be safe because
perf_event_disable removes the event atomically before it clears the
per-cpu watchdog_ev so it cannot change anything under running handler
feet.

The hrtimer is simply restarted (thanks to Don Zickus who has pointed
this out) if it is queued because we cannot rely it will fire&adopt to
the new sampling period before a new nmi event triggers (when the
treshold is decreased).

[akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 17:00:25 -07:00
Michal Hocko
359e6fab66 watchdog: update watchdog attributes atomically
proc_dowatchdog doesn't synchronize multiple callers which might lead to
confusion when two parallel callers might confuse watchdog_enable_all_cpus
resp watchdog_disable_all_cpus (eg watchdog gets enabled even if
watchdog_thresh was set to 0 already).

This patch adds a local mutex which synchronizes callers to the sysctl
handler.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 17:00:25 -07:00
Linus Torvalds
e288e931c1 Merge branch 'bcache' (bcache fixes from Kent Overstreet)
Merge bcache fixes from Kent Overstreet:
 "There's fixes for _three_ different data corruption bugs, all of which
  were found by users hitting them in the wild.

  The first one isn't bcache specific - in 3.11 bcache was switched to
  the bio_copy_data in fs/bio.c, and that's when the bug in that code
  was discovered, but it's also used by raid1 and pktcdvd.  (That was my
  code too, so the bug's doubly embarassing given that it was or
  should've been just a cut and paste from bcache code.  Dunno what
  happened there).

  Most of these (all the non data corruption bugs, actually) were ready
  before the merge window and have been sitting in Jens' tree, but I
  don't know what's been up with him lately..."

* emailed patches from Kent Overstreet <kmo@daterainc.com>:
  bcache: Fix flushes in writeback mode
  bcache: Fix for handling overlapping extents when reading in a btree node
  bcache: Fix a shrinker deadlock
  bcache: Fix a dumb CPU spinning bug in writeback
  bcache: Fix a flush/fua performance bug
  bcache: Fix a writeback performance regression
  bcache: Correct printf()-style format length modifier
  bcache: Fix for when no journal entries are found
  bcache: Strip endline when writing the label through sysfs
  bcache: Fix a dumb journal discard bug
  block: Fix bio_copy_data()
2013-09-24 14:42:03 -07:00
Kent Overstreet
c0f04d88e4 bcache: Fix flushes in writeback mode
In writeback mode, when we get a cache flush we need to make sure we
issue a flush to the backing device.

The code for sending down an extra flush was wrong - by cloning the bio
we were probably getting flags that didn't make sense for a bare flush,
and also the old code was firing for FUA bios, for which we don't need
to send a flush to the backing device.

This was causing data corruption somehow - the mechanism was never
determined, but this patch fixes it for the users that were seeing it.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Kent Overstreet
84786438ed bcache: Fix for handling overlapping extents when reading in a btree node
btree_sort_fixup() was overly clever, because it was trying to avoid
pulling a key off the btree iterator in more than one place.

This led to a really obscure bug where we'd break early from the loop in
btree_sort_fixup() if the current key overlapped with keys in more than
one older set, and the next key it overlapped with was zero size.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Kent Overstreet
a698e08c82 bcache: Fix a shrinker deadlock
GFP_NOIO means we could be getting called recursively - mca_alloc() ->
mca_data_alloc() - definitely can't use mutex_lock(bucket_lock) then.
Whoops.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Kent Overstreet
79e3dab90d bcache: Fix a dumb CPU spinning bug in writeback
schedule_timeout() != schedule_timeout_uninterruptible()

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Kent Overstreet
1394d6761b bcache: Fix a flush/fua performance bug
bch_journal_meta() was missing the flush to make the journal write
actually go down (instead of waiting up to journal_delay_ms)...

Whoops

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Kent Overstreet
c2a4f3183a bcache: Fix a writeback performance regression
Background writeback works by scanning the btree for dirty data and
adding those keys into a fixed size buffer, then for each dirty key in
the keybuf writing it to the backing device.

When read_dirty() finishes and it's time to scan for more dirty data, we
need to wait for the outstanding writeback IO to finish - they still
take up slots in the keybuf (so that foreground writes can check for
them to avoid races) - without that wait, we'll continually rescan when
we'll be able to add at most a key or two to the keybuf, and that takes
locks that starves foreground IO.  Doh.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Geert Uytterhoeven
61cbd250f8 bcache: Correct printf()-style format length modifier
Fix

  drivers/md/bcache/btree.c: In function ‘bch_btree_node_read’:
  drivers/md/bcache/btree.c:259: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘size_t’

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Kent Overstreet
c426c4fd46 bcache: Fix for when no journal entries are found
The journal replay code didn't handle this case, causing it to go into
an infinite loop...

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Gabriel de Perthuis
aee6f1cfff bcache: Strip endline when writing the label through sysfs
sysfs attributes with unusual characters have crappy failure modes
in Squeeze (udev 164); later versions of udev are unaffected.

This should make these characters more unusual.

Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Kent Overstreet
6d9d21e35f bcache: Fix a dumb journal discard bug
That switch statement was obviously wrong, leading to some sort of weird
spinning on rare occasion with discards enabled...

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:43 -07:00
Kent Overstreet
2f6cf0de02 block: Fix bio_copy_data()
The memcpy() in bio_copy_data() was using the wrong offset vars, leading
to data corruption in weird unusual setups.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.9
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 14:41:42 -07:00
David Vrabel
24f69373e2 xen/balloon: don't alloc page while non-preemptible
get_balloon_scratch_page() disables preemption so we cannot call
alloc_page() in between get/put_balloon_scratch_page().  Shuffle bits
around in decrease_reservation() to avoid this.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-24 16:22:27 -04:00
Konrad Rzeszutek Wilk
a945928ea2 xen: Do not enable spinlocks before jump_label_init() has executed
xen_init_spinlocks() currently calls static_key_slow_inc() before
jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is
the case) the effect of this static_key_slow_inc() is deferred until after
jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in
which case the key is set immediately. Thus, depending on the value of config
option, we may observe different behavior.

In addition, when we come to __jump_label_transform() from jump_label_init(),
the key (paravirt_ticketlocks_enabled) is already enabled. On processors where
ideal_nop is not the same as default_nop this will cause a BUG() since it is
expected that before a key is enabled the latter is replaced by the former
during initialization.

To address this problem we need to move
static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called
after jump_label_init(). We also need to make sure that this is done before
other cpus start to boot. early_initcall appears to be  a good place to do so.
(Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops
need to be set before alternative_instructions() runs.)

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Added extra comments in the code]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2013-09-24 16:22:26 -04:00
Jason Gunthorpe
bf4a7c054b tpm: xen-tpmfront: Remove the locality sysfs attribute
Upon deeper review it was agreed to remove the driver-unique
'locality' sysfs attribute before it is present in a released
kernel.

The attribute was introduced in e2683957fb
during the 3.12 merge window, so this patch needs to go in before
3.12 is released.

The hope is to have a well defined locality API that all the other
locality aware drivers can use, perhaps in 3.13.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
2013-09-24 16:15:15 -04:00
Jason Gunthorpe
56be88954b tpm: xen-tpmfront: Fix default durations
All the default durations were being set to 10 minutes which is
way too long for the timeouts. Normal values for the longest
duration are around 5 mins, and short duration ar around .5s.

Further, these are just the default, tpm_get_timeouts will set
them to values from the TPM (or throw an error).

Just remove them.

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-24 16:14:55 -04:00
Daniel Vetter
67c72a1225 drm/i915: preserve pipe A quirk in i9xx_set_pipeconf
This regression has been introduced in

commit 9f11a9e4e5
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jun 13 00:54:58 2013 +0200

    drm/i915: set up PIPECONF explicitly for i9xx/vlv platforms

Ville brough up the idea that this is just the pipe A quirk gone
wrong.

Note that after resume the bios might or might not have enabled pipe A
already.  We have a bit of magic to make sure that on resume we set up
a decent mode for pipe A, but I fear if I just smash pipe A to always
on we'd enable it in a bogus state and hang the hw. Hence the
readback.

v2: Clarify the logic a bit as suggested by Chris. Also amend the
commit message to clarify why we don't unconditionally enable the
pipe.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66462
References: https://lkml.org/lkml/2013/8/26/238
Cc: Meelis Roos <mroos@ut.ee>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Use |= instead of = as suggested by Chris.]
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-24 20:39:00 +02:00
Daniel Vetter
1062b81598 drm/i915/tv: clear adjusted_mode.flags
The native TV encoder has it's own flags to adjust sync modes and
enabled interlaced modes which are totally irrelevant for the adjusted
mode. This worked out nicely since the input modes used by both the
load detect code and reported in the ->get_modes callbacks all have no
flags set, and we also don't fill out any of them in the ->get_config
callback.

This changed with the additional sanitation done with

commit 2960bc9cce
Author: Imre Deak <imre.deak@intel.com>
Date:   Tue Jul 30 13:36:32 2013 +0300

    drm/i915: make user mode sync polarity setting explicit

sinc now the "no flags at all" state wouldn't fit through core code
any more. So fix this up again by explicitly clearing the flags in the
->compute_config callback.

Aside: We have zero checking in place to make sure that the requested
mode is indeed the right input mode we want for the selected TV mode.
So we'll happily fall over if userspace tries to pull us.  But that's
definitely work for a different patch series. So just add a FIXME
comment for now.

Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Knut Petersen <Knut_Petersen@t-online.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-24 20:38:59 +02:00
Jani Nikula
8d16f25821 drm/i915/dp: increase i2c-over-aux retry interval on AUX DEFER
There is no clear cut rules or specs for the retry interval, as there
are many factors that affect overall response time. Increase the
interval, and even more so on branch devices which may have limited i2c
bit rates.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=60263
Tested-by: Nicolas Suzor <nic@suzor.com>
Reviewed-by: Todd Previte <tprevite@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-24 20:38:54 +02:00
Dave Chinner
566055d33a xfs: log recovery lsn ordering needs uuid check
After a fair number of xfstests runs, xfs/182 started to fail
regularly with a corrupted directory - a directory read verifier was
failing after recovery because it found a block with a XARM magic
number (remote attribute block) rather than a directory data block.

The first time I saw this repeated failure I did /something/ and the
problem went away, so I was never able to find the underlying
problem. Test xfs/182 failed again today, and I found the root
cause before I did /something else/ that made it go away.

Tracing indicated that the block in question was being correctly
logged, the log was being flushed by sync, but the buffer was not
being written back before the shutdown occurred. Tracing also
indicated that log recovery was also reading the block, but then
never writing it before log recovery invalidated the cache,
indicating that it was not modified by log recovery.

More detailed analysis of the corpse indicated that the filesystem
had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM
block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which
indicated it was a block from an older filesystem. The reason that
log recovery didn't replay it was that the LSN in the XARM block was
larger than the LSN of the transaction being replayed, and so the
block was not overwritten by log recovery.

Hence, log recovery cant blindly trust the magic number and LSN in
the block - it must verify that it belongs to the filesystem being
recovered before using the LSN. i.e. if the UUIDs don't match, we
need to unconditionally recovery the change held in the log.

This patch was first tested on a block device that was repeatedly
causing xfs/182 to fail with the same failure on the same block with
the same directory read corruption signature (i.e. XARM block). It
did not fail, and hasn't failed since.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24 12:35:57 -05:00
Dave Chinner
b771af2fcb xfs: fix XFS_IOC_FREE_EOFBLOCKS definition
It uses a kernel internal structure in it's definition rather than
the user visible structure that is passed to the ioctl.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24 12:35:08 -05:00
Dave Chinner
b313a5f1cb xfs: asserting lock not held during freeing not valid
When we free an inode, we do so via RCU. As an RCU lookup can occur
at any time before we free an inode, and that lookup takes the inode
flags lock, we cannot safely assert that the flags lock is not held
just before marking it dead and running call_rcu() to free the
inode.

We check on allocation of a new inode structre that the lock is not
held, so we still have protection against locks being leaked and
hence not correctly initialised when allocated out of the slab.
Hence just remove the assert...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24 12:32:57 -05:00
Dave Chinner
4885235806 xfs: lock the AIL before removing the buffer item
Regression introduced by commit 46f9d2e ("xfs: aborted buf items can
be in the AIL") which fails to lock the AIL before removing the
item. Spinlock debugging throws a warning about this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-24 12:31:41 -05:00
David Ahern
384c671e33 perf trace: Add mmap2 handler
5c5e854b changed perf_event__synthesize_mmap_events to generate MMAP2
events. Since perf-trace does not have a handler for it it dies with a
segfault when trying to process files:

perf trace -i /tmp/perf.data
Segmentation fault

Signed-off-by: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/1379900700-5186-4-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-24 14:15:51 -03:00
Jiri Olsa
4921e32024 perf kmem: Make it work again on non NUMA machines
The commit '2814eb0 perf kmem: Remove die() calls' disabled 'perf kmem'
command for machines without numa support. It made the command fail if
'/sys/devices/system/node' dir wasn't found.

Skipping the numa based initialization in case the directory is not
found and continue execution.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1379003976-5839-5-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-24 14:13:46 -03:00
Russell King
db6aaf4d55 drm/i2c: tda998x: fix audio muting
Fix a bug that was introduced in commit c4c11dd160 ("drm/i2c: tda998x:
add video and audio input configuration") when Sebastian cleaned up my
original patch.  Without this being fixed, audio is muted when the
display is turned off, never to be re-enabled.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Darren Etheridge <detheridge@ti.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 09:41:18 -07:00
Davidlohr Bueso
53dad6d3a8 ipc: fix race with LSMs
Currently, IPC mechanisms do security and auditing related checks under
RCU.  However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition.  Manfred illustrates this nicely,
for instance with shared mem and selinux:

 -> do_shmat calls rcu_read_lock()
 -> do_shmat calls shm_object_check().
     Checks that the object is still valid - but doesn't acquire any locks.
     Then it returns.
 -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
 -> selinux_shm_shmat calls ipc_has_perm()
 -> ipc_has_perm accesses ipc_perms->security

shm_close()
 -> shm_close acquires rw_mutex & shm_lock
 -> shm_close calls shm_destroy
 -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
 -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
 -> ipc_free_security calls kfree(ipc_perms->security)

This patch delays the freeing of the security structures after all RCU
readers are done.  Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept.  Linus states:

 "... the old behavior was suspect for another reason too: having the
  security blob go away from under a user sounds like it could cause
  various other problems anyway, so I think the old code was at least
  _prone_ to bugs even if it didn't have catastrophic behavior."

I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 09:36:53 -07:00
Alex Deucher
13c5bfdad7 drm/radeon/cik: fix overflow in vram fetch
Missing ULL when calculating the amount of vram
leads to an overflow when the amount of vram is >= 4G.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2013-09-24 11:01:24 -04:00
Alex Deucher
99d79aa2f3 drm/radeon: add missing hdmi callbacks for rv6xx
When dpm was merged, I added a new asic struct for
rv6xx, but it never got properly updated when the
hdmi callbacks were added due to the two patch sets
being developed in parallel.

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=69729

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
2013-09-24 11:00:59 -04:00
Jeff Mahoney
721a769c03 reiserfs: fix race with flush_used_journal_lists and flush_journal_list
There are two locks involved in managing the journal lists. The general
reiserfs_write_lock and the journal->j_flush_mutex.

While flush_journal_list is sleeping to acquire the j_flush_mutex or to
submit a block for write, it will drop the write lock. This allows
another thread to acquire the write lock and ultimately call
flush_used_journal_lists to traverse the list of journal lists and
select one for flushing. It can select the journal_list that has just
had flush_journal_list called on it in the original thread and call it
again with the same journal_list.

The second thread then drops the write lock to acquire j_flush_mutex and
the first thread reacquires it and continues execution and eventually
clears and frees the journal list before dropping j_flush_mutex and
returning.

The second thread acquires j_flush_mutex and ends up operating on a
journal_list that has already been released. If the memory hasn't
been reused, we'll soon after hit a BUG_ON because the transaction id
has already been cleared. If it's been reused, we'll crash in other
fun ways.

Since flush_journal_list will synchronize on j_flush_mutex, we can fix
the race by taking a proper reference in flush_used_journal_lists
and checking to see if it's still valid after the mutex is taken. It's
safe to iterate the list of journal lists and pick a list with
just the write lock as long as a reference is taken on the journal list
before we drop the lock. We already have code to handle whether a
transaction has been flushed already so we can use that to handle the
race and get rid of the trans_id BUG_ON.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-09-24 11:24:21 +02:00
Jeff Mahoney
7bc9cc07ee reiserfs: remove useless flush_old_journal_lists
Commit a3172027 introduced test_transaction as a requirement for
flushing old lists -- but it can never return 1 unless the transaction
has already been flushed.

As a result, we have a routine that iterates the j_realblocks list but
doesn't actually do anything. Since it's been this way since 2006 and
the latency numbers were what Chris expected, let's just rip it out.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-09-24 11:24:21 +02:00
Jan Kara
69d75671d9 udf: Fortify LVID loading
A user has reported an oops in udf_statfs() that was caused by
numOfPartitions entry in LVID structure being corrupted. Fix the problem
by verifying whether numOfPartitions makes sense at least to the extent
that LVID fits into a single block as it should.

Reported-by: Juergen Weigert <jw@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-09-24 11:23:33 +02:00
Maciej W. Rozycki
becee6b8c7 MIPS: cpu-features.h: s/MIPS53/MIPS64/
No support for MIPS53 processors yet.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5876/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-09-24 11:07:18 +02:00
Chris Wilson
e29bb4ebbf drm/i915: Use a temporary va_list for two-pass string handling
In

commit edc3d8848d
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Thu May 23 13:55:35 2013 +0300

    drm/i915: avoid big kmallocs on reading error state

we introduce a two-pass mechanism for splitting long strings being
formatted into the error-state. The first pass finds the length, and the
second pass emits the right portion of the string into the accumulation
buffer. Unfortunately we use the same va_list for both passes, resulting
in the second pass reading garbage off the end of the argument list. As
the two passes are only used for boundaries between read() calls, the
corruption is only rarely seen.

This fixes the root cause behind

commit baf27f9b17
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 29 23:26:50 2013 +0100

    drm/i915: Break up the large vsnprintf() in print_error_buffers()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-24 09:36:40 +02:00
Xenia Ragiadakou
38d7f68851 usbcore: check usb device's state before sending a Set SEL control transfer
Set SEL control urbs cannot be sent to a device in unconfigured state.
This patch adds a check in usb_req_set_sel() to ensure the usb device's
state is USB_STATE_CONFIGURED.

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reported-by: Martin MOKREJS <mmokrejs@gmail.com>
Suggested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2013-09-23 15:43:32 -07:00
Florian Wolter
526867c3ca xhci: Fix race between ep halt and URB cancellation
The halted state of a endpoint cannot be cleared over CLEAR_HALT from a
user process, because the stopped_td variable was overwritten in the
handle_stopped_endpoint() function. So the xhci_endpoint_reset() function will
refuse the reset and communication with device can not run over this endpoint.
https://bugzilla.kernel.org/show_bug.cgi?id=60699

Signed-off-by: Florian Wolter <wolly84@web.de>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2013-09-23 15:43:31 -07:00
Sarah Sharp
8b3d45705e usb: Fix xHCI host issues on remote wakeup.
When a device signals remote wakeup on a roothub, and the suspend change
bit is set, the host controller driver must not give control back to the
USB core until the port goes back into the active state.

EHCI accomplishes this by waiting in the get port status function until
the PORT_RESUME bit is cleared:

                        /* stop resume signaling */
                        temp &= ~(PORT_RWC_BITS | PORT_SUSPEND | PORT_RESUME);
                        ehci_writel(ehci, temp, status_reg);
                        clear_bit(wIndex, &ehci->resuming_ports);
                        retval = ehci_handshake(ehci, status_reg,
                                        PORT_RESUME, 0, 2000 /* 2msec */);

Similarly, the xHCI host should wait until the port goes into U0, before
passing control up to the USB core.  When the port transitions from the
RExit state to U0, the xHCI driver will get a port status change event.
We need to wait for that event before passing control up to the USB
core.

After the port transitions to the active state, the USB core should time
a recovery interval before it talks to the device.  The length of that
recovery interval is TRSMRCY, 10 ms, mentioned in the USB 2.0 spec,
section 7.1.7.7.  The previous xHCI code (which did not wait for the
port to go into U0) would cause the USB core to violate that recovery
interval.

This bug caused numerous USB device disconnects on remote wakeup under
ChromeOS and a Lynx Point LP xHCI host that takes up to 20 ms to move
from RExit to U0.  ChromeOS is very aggressive about power savings, and
sets the autosuspend_delay to 100 ms, and disables USB persist.

I attempted to replicate this bug with Ubuntu 12.04, but could not.  I
used Ubuntu 12.04 on the same platform, with the same BIOS that the bug
was triggered on ChromeOS with.  I also changed the USB sysfs settings
as described above, but still could not reproduce the bug under Ubuntu.
It may be that ChromeOS userspace triggers this bug through additional
settings.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
2013-09-23 15:43:31 -07:00
Mathias Nyman
ec7e43e2d9 xhci: Ensure a command structure points to the correct trb on the command ring
If a command on the command ring needs to be cancelled before it is handled
it can be turned to a no-op operation when the ring is stopped.
We want to store the command ring enqueue pointer in the command structure
when the command in enqueued for the cancellation case.

Some commands used to store the command ring dequeue pointers instead of enqueue
(these often worked because enqueue happends to equal dequeue quite often)

Other commands correctly used the enqueue pointer but did not check if it pointed
to a valid trb or a link trb, this caused for example stop endpoint command to timeout in
xhci_stop_device() in about 2% of suspend/resume cases.

This should also solve some weird behavior happening in command cancellation cases.

This patch is based on a patch submitted by Sarah Sharp to linux-usb, but
then forgotten:
    http://marc.info/?l=linux-usb&m=136269803207465&w=2

This patch should be backported to kernels as old as 3.7, that contain
the commit b92cc66c04 "xHCI: add aborting
command ring function"

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable@vger.kernel.org
2013-09-23 15:43:30 -07:00
Mathias Nyman
284d205524 xhci: Fix oops happening after address device timeout
When a command times out, the command ring is first aborted,
and then stopped. If the command ring is empty when it is stopped
the stop event will point to next command which is not yet set.
xHCI tries to handle this next event often causing an oops.

Don't handle command completion events on stopped cmd ring if ring is
empty.

This patch should be backported to kernels as old as 3.7, that contain
the commit b92cc66c04 "xHCI: add aborting
command ring function"

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Reported-by: Giovanni <giovanni.nervi@yahoo.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable@vger.kernel.org
2013-09-23 15:43:30 -07:00
Linus Torvalds
4a10c2ac2f Linux 3.12-rc2 2013-09-23 15:41:09 -07:00
Linus Torvalds
9d23108df3 Staging fixes for 3.12-rc2
Here are a number of small staging tree and iio driver fixes.  Nothing major,
 just lots of little things.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJAdRIACgkQMUfUDdst+ymS6QCfYTqZCbIvEbdSmkbaVFkMDsoE
 J4gAoKeQAL9ltn1fh65XDKlQSwLJaML3
 =USmd
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging fixes from Greg KH:
 "Here are a number of small staging tree and iio driver fixes.  Nothing
  major, just lots of little things"

* tag 'staging-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (34 commits)
  iio:buffer_cb: Add missing iio_buffer_init()
  iio: Prevent race between IIO chardev opening and IIO device free
  iio: fix: Keep a reference to the IIO device for open file descriptors
  iio: Stop sampling when the device is removed
  iio: Fix crash when scan_bytes is computed with active_scan_mask == NULL
  iio: Fix mcp4725 dev-to-indio_dev conversion in suspend/resume
  iio: Fix bma180 dev-to-indio_dev conversion in suspend/resume
  iio: Fix tmp006 dev-to-indio_dev conversion in suspend/resume
  iio: iio_device_add_event_sysfs() bugfix
  staging: iio: ade7854-spi: Fix return value
  staging:iio:hmc5843: Fix measurement conversion
  iio: isl29018: Fix uninitialized value
  staging:iio:dummy fix kfifo_buf kconfig dependency issue if kfifo modular and buffer enabled for built in dummy driver.
  iio: at91: fix adc_clk overflow
  staging: line6: add bounds check in snd_toneport_source_put()
  Staging: comedi: Fix dependencies for drivers misclassified as PCI
  staging: r8188eu: Adjust RX gain
  staging: r8188eu: Fix smatch warning in core/rtw_ieee80211.
  staging: r8188eu: Fix smatch error in core/rtw_mlme_ext.c
  staging: r8188eu: Fix Smatch off-by-one warning in hal/rtl8188e_hal_init.c
  ...
2013-09-23 12:53:07 -07:00
Linus Torvalds
e04a0a5ab9 USB fixes for 3.12-rc2
Here are a number of small USB fixes for 3.12-rc2.
 
 One is a revert of a EHCI change that isn't quite ready for 3.12.  Others are
 minor things, gadget fixes, Kconfig fixes, and some quirks and documentation
 updates.
 
 All have been in linux-next for a bit.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlJAapIACgkQMUfUDdst+yksHQCfeYJVtsWa5aG1OgLGmVC7HzGW
 SNMAniUFk9Cg9AazfBNfURsfuucEmv6w
 =kf93
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a number of small USB fixes for 3.12-rc2.

  One is a revert of a EHCI change that isn't quite ready for 3.12.
  Others are minor things, gadget fixes, Kconfig fixes, and some quirks
  and documentation updates.

  All have been in linux-next for a bit"

* tag 'usb-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: pl2303: distinguish between original and cloned HX chips
  USB: Faraday fotg210: fix email addresses
  USB: fix typo in usb serial simple driver Kconfig
  Revert "USB: EHCI: support running URB giveback in tasklet context"
  usb: s3c-hsotg: do not disconnect gadget when receiving ErlySusp intr
  usb: s3c-hsotg: fix unregistration function
  usb: gadget: f_mass_storage: reset endpoint driver data when disabled
  usb: host: fsl-mph-dr-of: Staticize local symbols
  usb: gadget: f_eem: Staticize eem_alloc
  usb: gadget: f_ecm: Staticize ecm_alloc
  usb: phy: omap-usb3: Fix return value
  usb: dwc3: gadget: avoid memory leak when failing to allocate all eps
  usb: dwc3: remove extcon dependency
  usb: gadget: add '__ref' for rndis_config_register() and cdc_config_register()
  usb: dwc3: pci: add support for BayTrail
  usb: gadget: cdc2: fix conversion to new interface of f_ecm
  usb: gadget: fix a bug and a WARN_ON in dummy-hcd
  usb: gadget: mv_u3d_core: fix violation of locking discipline in mv_u3d_ep_disable()
2013-09-23 12:52:35 -07:00
Christian König
4b40e59212 drm/radeon/uvd: lower msg&fb buffer requirements on UVD3
Starting with UVD3 message and feedback buffers have their
own 256MB segment, so no need to force them into VRAM any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-09-23 11:00:12 -04:00
Alex Deucher
4a1132a023 drm/radeon: disable tests/benchmarks if accel is disabled
The tests are only usable if the acceleration engines have
been successfully initialized.

Based on an initial patch from: Alex Ivanov <gnidorah@p0n4ik.tk>

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2013-09-23 10:53:18 -04:00
Mike Snitzer
e8603136cb dm: add reserved_bio_based_ios module parameter
Allow user to change the number of IOs that are reserved by
bio-based DM's mempools by writing to this file:
/sys/module/dm_mod/parameters/reserved_bio_based_ios

The default value is RESERVED_BIO_BASED_IOS (16).  The maximum allowed
value is RESERVED_MAX_IOS (1024).

Export dm_get_reserved_bio_based_ios() for use by DM targets and core
code.  Switch to sizing dm-io's mempool and bioset using DM core's
configurable 'reserved_bio_based_ios'.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Frank Mayhar <fmayhar@google.com>
2013-09-23 10:42:24 -04:00
Mike Snitzer
f47908269f dm: add reserved_rq_based_ios module parameter
Allow user to change the number of IOs that are reserved by
request-based DM's mempools by writing to this file:
/sys/module/dm_mod/parameters/reserved_rq_based_ios

The default value is RESERVED_REQUEST_BASED_IOS (256).  The maximum
allowed value is RESERVED_MAX_IOS (1024).

Export dm_get_reserved_rq_based_ios() for use by DM targets and core
code.  Switch to sizing dm-mpath's mempool using DM core's configurable
'reserved_rq_based_ios'.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
2013-09-23 10:42:24 -04:00
Mike Snitzer
6cfa58573f dm: lower bio-based mempool reservation
Bio-based device mapper processing doesn't need larger mempools (like
request-based DM does), so lower the number of reserved entries for
bio-based operation.  16 was already used for bio-based DM's bioset
but mistakenly wasn't used for it's _io_cache.

Formalize difference between bio-based and request-based defaults by
introducing RESERVED_BIO_BASED_IOS and RESERVED_REQUEST_BASED_IOS.

(based on older code from Mikulas Patocka)

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
2013-09-23 10:42:23 -04:00
Mike Snitzer
b60ab990cc dm thin: do not expose non-zero discard limits if discards disabled
Fix issue where the block layer would stack the discard limits of the
pool's data device even if the "ignore_discard" pool feature was
specified.

The pool and thin device(s) still had discards disabled because the
QUEUE_FLAG_DISCARD request_queue flag wasn't set.  But to avoid user
confusion when "ignore_discard" is used: both the pool device and the
thin device(s) have zeroes for all discard limits.

Also, always set discard_zeroes_data_unsupported in targets because they
should never advertise the 'discard_zeroes_data' capability (even if the
pool's data device supports it).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
2013-09-23 10:42:06 -04:00