Commit Graph

300439 Commits

Author SHA1 Message Date
Jim Meyering
f07c9a79f0 Btrfs: avoid buffer overrun in btrfs_printk
The buffer read-overrun would be triggered by a printk format
starting with <N>, where N is a single digit.  NUL-terminate
after strncpy.  Use memcpy, not strncpy, since we know the
string we're copying fits in the destination buffer and
contains no NUL byte.

Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30 10:23:31 -04:00
Daniel J Blueman
2eec6c8102 Fix minor type issues
Address some minor type issues identified by sparse checker.

Signed-off-by: Daniel J Blueman <daniel@quora.org>
2012-05-30 10:23:30 -04:00
Sergei Trofimovich
0d2450abfa btrfs: allow changing 'thread_pool' size at remount time
Changing 'mount -oremount,thread_pool=2 /' didn't make any effect:

maximum amount of worker threads is specified in 2 places:
- in 'strict btrfs_fs_info::thread_pool_size'
- in each worker struct: 'struct btrfs_workers::max_workers'

'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'.

Fix it by pushing new maximum value to all created worker structures
as well.

Cc: Josef Bacik <josef@redhat.com>
Cc: Chris Mason <chris.mason@oracle.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2012-05-30 10:23:30 -04:00
Josef Bacik
0885ef5b56 Btrfs: do not do filemap_write_and_wait_range in fsync
We already do the btrfs_wait_ordered_range which will do this for us, so
just remove this call so we don't call it twice.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:29 -04:00
Josef Bacik
551ebb2d34 Btrfs: remove useless waiting and extra filemap work
In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice
because compression does strange things and then waiting.  Then we look up
ordered extents and if we find any we will always schedule_timeout(); once
and then loop back around and do it all again.  We will even check to see if
there is delalloc pages on this range and loop again.  So this patch gets
rid of the multipe fdata_write() calls and just does
filemap_write_and_wait().  In the case of compression we will still find the
ordered extents and start those individually if we need to so that is ok,
but in the normal buffered case we avoid all this weird overhead.

Then in the case of the schedule_timeout(1), we don't need it.  All callers
either 1) don't care, they just want to make sure what they just wrote maeks
it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing
in which case it will lock and check for ordered extents _anyway_ so get
back to them as quickly as possible.  The delaloc check is simply not
needed, this only catches the case where we write to the file again since
doing the filemap_write_and_wait() and if the caller truly cares about that
it will take care of everything itself.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik
d7dbe9e7f6 Btrfs: fix compile warnings in extent_io.c
These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,
Btrfs: fix compile warnings in extent_io.c

These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik
30f8fe3e47 Btrfs: cache no acl on new inodes
When running compilebench I noticed we were spending some time looking up
acls on new inodes, which shouldn't be happening since there were no acls.
This is because when we init acls on the inode after creating them we don't
cache the fact there are no acls if there aren't any.  Doing this adds a
little bit of a bump to my compilebench runs.  Thanks,
Btrfs: cache no acl on new inodes

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00
Josef Bacik
0c4d2d95d0 Btrfs: use i_version instead of our own sequence
We've been keeping around the inode sequence number in hopes that somebody
would use it, but nobody uses it and people actually use i_version which
serves the same purpose, so use i_version where we used the incore inode's
sequence number and that way the sequence is updated properly across the
board, and not just in file write.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00
Jan Schmidt
20b297d620 Btrfs: tree mod log sanity checks in join_transaction
When a fresh transaction begins, the tree mod log must be clean. Users of
the tree modification log must ensure they never span across transaction
boundaries.

We reset the sequence to 0 in this safe situation to make absolutely sure
overflow can't happen.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:36 +02:00
Jan Schmidt
19ae4e8133 Btrfs: fs_info variable for join_transaction
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:35 +02:00
Jan Schmidt
8445f61cad Btrfs: use the tree modification log for backref resolving
This enables backref resolving on life trees while they are changing. This
is a prerequisite for quota groups and just nice to have for everything
else.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:34 +02:00
Jan Schmidt
5d9e75c41d Btrfs: add btrfs_search_old_slot
The tree modification log together with the current state of the tree gives
a consistent, old version of the tree. btrfs_search_old_slot is used to
search through this old version and return old (dummy!) extent buffers.
Naturally, this function cannot do any tree modifications.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:33 +02:00
Jan Schmidt
f3ea38da3e Btrfs: add del_ptr and insert_ptr modifications to the tree mod log
Record all relevant modifications to block pointers in the tree mod log so
that we can rewind them later on for backref walking.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:32 +02:00
Jan Schmidt
f230475e62 Btrfs: put all block modifications into the tree mod log
When running functions that can make changes to the internal trees
(e.g. btrfs_search_slot), we check if somebody may be interested in the
block we're currently modifying. If so, we record our modification to be
able to rewind it later on.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:29 +02:00
Jan Schmidt
bd989ba359 Btrfs: add tree modification log functions
The tree mod log will log modifications made fs-tree nodes. Most
modifications are done by autobalance of the tree. Such changes are recorded
as long as a block entry exists. When released, the log is cleaned.

With the tree modification log, it's possible to reconstruct a consistent
old state of the tree. This is required to do backref walking on a busy
file system.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:01 +02:00
Jan Schmidt
f29021b29a Btrfs: add tree mod log to fs_info
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:54 +02:00
Jan Schmidt
815a51c74a Btrfs: dummy extent buffers for tree mod log
The tree modification log needs two ways to create dummy extent buffers,
once by allocating a fresh one (to rebuild an old root) and once by
cloning an existing one (to make private rewind modifications) to it.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:54 +02:00
Jan Schmidt
64947ec0d1 Btrfs: move struct seq_list to ctree.h
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:53 +02:00
Jan Schmidt
5581a51a59 Btrfs: don't set for_cow parameter for tree block functions
Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed
parameter for_cow = 1. In fact, these two functions should never mark
their tree modification operations as for_cow, because they can change
the number of blocks referenced by a tree.

Hence, we remove the extra for_cow parameter from these functions and
make them pass a zero down.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:53 +02:00
Jan Schmidt
976b1908d9 Btrfs: look into the extent during find_all_leafs
Before this patch we called find_all_leafs for a data extent, then called
find_all_roots and then looked into the extent to grab the information
we were seeking. This was done without holding the leaves locked to avoid
deadlocks. However, this can obviouly race with concurrent tree
modifications.

Instead, we now look into the extent while we're holding the lock during
find_all_leafs and store this information together with the leaf list.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:52 +02:00
Jan Schmidt
d5c88b735f Btrfs: bugfix: ignore the wrong key for indirect tree block backrefs
The key we store with a tree block backref is only a hint. It is set when
the ref is created and can remain correct for a long time. As the tree is
rebalanced, however, eventually the key no longer points to the correct
destination.

With this patch, we change find_parent_nodes to no longer add keys unless it
knows for sure they're correct (e.g. because they're for an extent data
backref). Then when we later encounter a backref ref with no parent and no
key set, we grab the block and take the first key from the block itself.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:51 +02:00
Jan Schmidt
dadcaf78b5 Btrfs: bugfix in btrfs_find_parent_nodes
That one has been around since the addition of backref.c. Due to the way we
calculate our slot numbers, after adding inline refs we're missing one keyed
ref unless it's located at the beginning of a new leaf.

Reported-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:51 +02:00
Jan Schmidt
cd1b413c5c Btrfs: ulist realloc bugfix
ulist_next gets the pointer to the previously returned element to find the
next element from there. However, when we call ulist_add while iteration
with ulist_next is in progress (ulist explicitly supports this), we can
realloc the ulist internal memory, which makes the pointer to the previous
element useless.

Instead, we now use an iterator parameter that's independent from the
internal pointers.

Reported-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-26 12:17:49 +02:00
Linus Torvalds
76e10d158e Linux 3.4 2012-05-20 15:29:13 -07:00
Linus Torvalds
d6c7797367 PARISC fixes on 20120519
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPt7wpAAoJEDeqqVYsXL0MgZUIAL6SMaBmgCZxZEb1pCymrn3c
 DLESwPGoA5lPs62ojZ1C8jhZhmEA7nfv+iDDCA/YYMPbyctD3ZH7moHgEJCHyvqJ
 9SByxT3uuYU4fxfQ1xxUQOe96gpuS9zBhvUYrfP6+/hdZakBAPqCWxVTqz1eET90
 2V09EBGiiTXwtFt9KZ640Tg1p+MBM8tI/lVaq8DCU4Sj99YNV9ZC26j6UszyI/NC
 K6doZjQ7nfCG/Ul88MwCH/akqCqupscwty1iXvuFTExVF5jfuCmMN2aaIShDlBPE
 ygAkc3j611FDcZXMnNbwUsD/jduR4V7FGQ2yg6U0D/IYZ+07aYC/51qksOu+amY=
 =KV0L
 -----END PGP SIGNATURE-----

Merge tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6

Pull PA-RISC fixes from James Bottomley:
 "This is a set of three bug fixes that gets parisc running again on
  systems with PA1.1 processors.

  Two fix regressions introduced in 2.6.39 and one fixes a prefetch bug
  that only affects PA7300LC processors.  We also have another pending
  fix to do with the sectional arrangement of vmlinux.lds, but there's a
  query on it during testing on one particular system type, so I'll hold
  off sending it in for now."

* tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
  [PARISC] fix panic on prefetch(NULL) on PA7300LC
  [PARISC] fix crash in flush_icache_page_asm on PA1.1
  [PARISC] fix PA1.1 oops on boot
2012-05-19 15:30:15 -07:00
Linus Torvalds
5d1204582e Merge branch 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 linker bug workarounds from Peter Anvin.

GNU ld-2.22.52.0.[12] (*) has an unfortunate bug where it incorrectly
turns certain relocation entries absolute.  Section-relative symbols
that are part of otherwise empty sections are silently changed them to
absolute.  We rely on section-relative symbols staying section-relative,
and actually have several sections in the linker script solely for this
purpose.

See for example

   http://sourceware.org/bugzilla/show_bug.cgi?id=14052

We could just black-list the buggy linker, but it appears that it got
shipped in at least F17, and possibly other distros too, so it's sadly
not some rare unusual case.

This backports the workaround from the x86/trampoline branch, and as
Peter says: "This is not a minimal fix, not at all, but it is a tested
code base."

* 'x86/ld-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, relocs: When printing an error, say relative or absolute
  x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
  x86, realmode: 16-bit real-mode code support for relocs tool

(*) That's a manly release numbering system. Stupid, sure. But manly.
2012-05-19 15:28:22 -07:00
Linus Torvalds
14e931a264 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A few small, but important fixes.  Most of them are marked for stable
  as well

   - Fix failure to release a semaphore on error path in mtip32xx.
   - Fix crashable condition in bio_get_nr_vecs().
   - Don't mark end-of-disk buffers as mapped, limit it to i_size.
   - Fix for build problem with CONFIG_BLOCK=n on arm at least.
   - Fix for a buffer overlow on UUID partition printing.
   - Trivial removal of unused variables in dac960."

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix buffer overflow when printing partition UUIDs
  Fix blkdev.h build errors when BLOCK=n
  bio allocation failure due to bio_get_nr_vecs()
  block: don't mark buffers beyond end of disk as mapped
  mtip32xx: release the semaphore on an error path
  dac960: Remove unused variables from DAC960_CreateProcEntries()
2012-05-19 10:12:17 -07:00
Linus Torvalds
a2ae978756 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull one more networking bug-fix from David Miller:
 "One last straggler.

  Eric Dumazet's pktgen unload oops fix was not entirely complete, but
  all the cases should be handled properly now....  fingers crossed."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  pktgen: fix module unload for good
2012-05-19 10:10:59 -07:00
Hugh Dickins
62ade86ab6 memcg,thp: fix res_counter:96 regression
Occasionally, testing memcg's move_charge_at_immigrate on rc7 shows
a flurry of hundreds of warnings at kernel/res_counter.c:96, where
res_counter_uncharge_locked() does WARN_ON(counter->usage < val).

The first trace of each flurry implicates __mem_cgroup_cancel_charge()
of mc.precharge, and an audit of mc.precharge handling points to
mem_cgroup_move_charge_pte_range()'s THP handling in commit 12724850e8
("memcg: avoid THP split in task migration").

Checking !mc.precharge is good everywhere else, when a single page is to
be charged; but here the "mc.precharge -= HPAGE_PMD_NR" likely to
follow, is liable to result in underflow (a lot can change since the
precharge was estimated).

Simply check against HPAGE_PMD_NR: there's probably a better
alternative, trying precharge for more, splitting if unsuccessful; but
this one-liner is safer for now - no kernel/res_counter.c:96 warnings
seen in 26 hours.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-19 10:10:27 -07:00
H. Peter Anvin
24ab82bd9b x86, relocs: When printing an error, say relative or absolute
When the relocs tool throws an error, let the error message say if it
is an absolute or relative symbol.  This should make it a lot more
clear what action the programmer needs to take and should help us find
the reason if additional symbol bugs show up.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
2012-05-18 19:50:02 -07:00
H. Peter Anvin
a3e854d95a x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
GNU ld 2.22.52.0.1 has a bug that it blindly changes symbols from
section-relative to absolute if they are in a section of zero length.
This turns the symbols __init_begin and __init_end into absolute
symbols.  Let the relocs program know that those should be treated as
relative symbols.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
2012-05-18 19:50:00 -07:00
H. Peter Anvin
6520fe5564 x86, realmode: 16-bit real-mode code support for relocs tool
A new option is added to the relocs tool called '--realmode'.
This option causes the generation of 16-bit segment relocations
and 32-bit linear relocations for the real-mode code. When
the real-mode code is moved to the low-memory during kernel
initialization, these relocation entries can be used to
relocate the code properly.

In the assembly code 16-bit segment relocations must be relative
to the 'real_mode_seg' absolute symbol. Linear relocations must be
relative to a symbol prefixed with 'pa_'.

16-bit segment relocation is used to load cs:ip in 16-bit code.
Linear relocations are used in the 32-bit code for relocatable
data references. They are declared in the linker script of the
real-mode code.

The relocs tool is moved to arch/x86/tools/relocs.c, and added new
target archscripts that can be used to build scripts needed building
an architecture.  be compiled before building the arch/x86 tree.

[ hpa: accelerating this because it detects invalid absolute
  relocations, a serious bug in binutils 2.22.52.0.x which currently
  produces bad kernels. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1336501366-28617-2-git-send-email-jarkko.sakkinen@intel.com
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2012-05-18 19:49:40 -07:00
Linus Torvalds
b1dab2f040 A fix to the thin provisioning userspace interface.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPtuPVAAoJEK2W1qbAHj1n3q8QAJaM5OkZ3DqAoC8XznVBKO1B
 Fr5d0mui4irC9ce+End/MCoPN2EcwQ5bzyh6cQXqxfUOTR5RjYMCn97uID0I35Cc
 AEbDaUMSgKkQkKaDeVpM54SaHBhtLP95gIZmo774etmryzn3HyWiiYOsvswnAw8w
 /zIZwPkRPoskoGt1Hk/o490qYGwstJ6JErJndyuVEdHNWF6xiL9cFAi81oo1UF2J
 lysWmBBaisRcAD3kzbTBIQ3ntMUPSbvM7ZLUaJXEbXwg+XUBRkBfQLXJk3wA5q8R
 QacTz+kvAXG6B+DipFOYyDTkxR+s5kjRserlBwh2gHP9u3pHV9Nf7eWR5BWsYPbV
 tLH8Q/0Ctl31MsrxMl7cO05Kj2sMI/lOg/yGQT3Qx6k0HVtbU1gaR6J21RmpElGC
 TANXpBhV5MRpfZFvwnHOSACnE3hMm1i4XJstTejIfPiNtq8aeL0eSB3Q+ZJTEZzm
 ZCk23ufmaqC75RrYt1P3F5QpxsjD+moMmZT5hZSkGOd0YqvwAMfv9O/5FesQ748R
 tc5zPxS/dTONseGkBaAPeHpcX4WP0wm0fKJcPb0Y/4kx0AkiVQjbUrNNcnVxo5Pw
 pF1MbxY14xZxteZ/UHWmPvyCuJ8NICTtswxj9taquSDb87pliV2xajrK4Mnmed7u
 sLxA0ANWBdJghss/r/g6
 =03zw
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm

Pull a dm fix from Alasdair G Kergon:
 "A fix to the thin provisioning userspace interface."

* tag 'dm-3.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm thin: fix table output when pool target disables discard passdown internally
2012-05-18 18:22:45 -07:00
Mike Snitzer
f402693d06 dm thin: fix table output when pool target disables discard passdown internally
When the thin pool target clears the discard_passdown parameter
internally, it incorrectly changes the table line reported to userspace.
This breaks dumb string comparisons on these table lines in generic
userspace device-mapper library code and leads to tables being reloaded
repeatedly when nothing is actually meant to be changing.

This patch corrects this by no longer changing the table line when
discard passdown was disabled.

We can still tell when discard passdown is overridden by looking for the
message "Discard unsupported by data device (sdX): Disabling discard passdown."

This automatic detection is also moved from the 'load' to the 'resume'
so that it is re-evaluated should the properties of underlying devices
change.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-05-19 01:01:01 +01:00
Linus Torvalds
2f05af8b59 Fix bug in recent fix to RAID10.
Without this patch, recovery will crash
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIVAwUAT7bVGznsnt1WYoG5AQL0ohAAvpF47NkSnuqw9gko25/Ndr5akrIC9kfz
 YNzxsd0FvNSEQipGS4KgBcRxnFxbcVkmHL+pN8McmiDxNew1LW2+zdtV47RJngnQ
 2slTUiSBLSHcvnOFblBrwIh+XpMdhv4FEMG13Si4tQaETtHp3JiNDr2JqvwbuKbo
 aO/nW38DFNjEcB59X+9npknZYaymvavxso4J7iH/ec0Iys2c2h+jX5afewCXnbYr
 Q6X2YlySUiUM3AXbgV0QI/w6xDM0TD+WWguHq411pgU1wHMYpHelYYwRGHSU73TJ
 061K5WB83Q53A8czeRXK33x0xGr/AhQesTel+mlV3eODHoGSnsYxH8Zja7sV220h
 HgTICrSvAFL2cvBz8dqJQ59KYH3yNo4Cie5xzmBxCuX6yyTxizgrQTp7L/HUy7RL
 eL1SqNFTRJcwSl3MwmgbL57saBLw+/ejV+og7PNBCXSS9gZg39uJyn1uTqJZTrcL
 PuBaizNG9HCYuR/pRTMsQGNWrksqk9r+hxwCJDIGb7xh9SBFI82YtQfK9F5fSB84
 vdmIec8yUvwS/Gxhz+I+YB3kqk/nwfFRbVGfBWvJdRBOR0DsZphsBapsJ83LZqFb
 VAa2nQL+NoHOweixv6RgiH4Y96SNHdWkvDJjj7R01BUDdpHeHMZuff07Effgqd64
 Knk64XvWKQc=
 =s7vy
 -----END PGP SIGNATURE-----

Merge tag 'md-3.4-fixes' of git://neil.brown.name/md

Pull one more md bugfix from NeilBrown:
 "Fix bug in recent fix to RAID10.

  Without this patch, recovery will crash"

* tag 'md-3.4-fixes' of git://neil.brown.name/md:
  md/raid10: fix transcription error in calc_sectors conversion.
2012-05-18 16:19:59 -07:00
Linus Torvalds
8394edf371 Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull tile tree bugfix from Chris Metcalf:
 "This fixes a security vulnerability (and correctness bug) in tilegx"

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tilegx: enable SYSCALL_WRAPPERS support
2012-05-18 16:16:42 -07:00
NeilBrown
b0d634d568 md/raid10: fix transcription error in calc_sectors conversion.
The old code was
		sector_div(stride, fc);
the new code was
		sector_dir(size, conf->near_copies);

'size' is right (the stride various wasn't really needed), but
'fc' means 'far_copies', and that is an important difference.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-19 09:01:13 +10:00
Linus Torvalds
73f1f5dd3e Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc fixes from Andrew Morton.

* emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches)
  frv: delete incorrect task prototypes causing compile fail
  slub: missing test for partial pages flush work in flush_all()
  fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
  drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
2012-05-18 15:56:25 -07:00
Linus Torvalds
30a08bf2d3 proc: move fd symlink i_mode calculations into tid_fd_revalidate()
Instead of doing the i_mode calculations at proc_fd_instantiate() time,
move them into tid_fd_revalidate(), which is where the other inode state
(notably uid/gid information) is updated too.

Otherwise we'll end up with stale i_mode information if an fd is re-used
while the dentry still hangs around.  Not that anything really *cares*
(symlink permissions don't really matter), but Tetsuo Handa noticed that
the owner read/write bits don't always match the state of the
readability of the file descriptor, and we _used_ to get this right a
long time ago in a galaxy far, far away.

Besides, aside from fixing an ugly detail (that has apparently been this
way since commit 61a2878402: "proc: Remove the hard coded inode
numbers" in 2006), this removes more lines of code than it adds.  And it
just makes sense to update i_mode in the same place we update i_uid/gid.

Al Viro correctly points out that we could just do the inode fill in the
inode iops ->getattr() function instead.  However, that does require
somewhat slightly more invasive changes, and adds yet *another* lookup
of the file descriptor.  We need to do the revalidate() for other
reasons anyway, and have the file descriptor handy, so we might as well
fill in the information at this point.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-18 14:06:17 -07:00
Eric Dumazet
d4b1133558 pktgen: fix module unload for good
commit c57b546840 (pktgen: fix crash at module unload) did a very poor
job with list primitives.

1) list_splice() arguments were in the wrong order

2) list_splice(list, head) has undefined behavior if head is not
initialized.

3) We should use the list_splice_init() variant to clear pktgen_threads
list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 13:54:33 -04:00
Chris Metcalf
e6d9668e11 tilegx: enable SYSCALL_WRAPPERS support
Some discussion with the glibc mailing lists revealed that this was
necessary for 64-bit platforms with MIPS-like sign-extension rules
for 32-bit values.  The original symptom was that passing (uid_t)-1 to
setreuid() was failing in programs linked -pthread because of the "setxid"
mechanism for passing setxid-type function arguments to the syscall code.
SYSCALL_WRAPPERS handles ensuring that all syscall arguments end up with
proper sign-extension and is thus the appropriate fix for this problem.

On other platforms (s390, powerpc, sparc64, and mips) this was fixed
in 2.6.28.6.  The general issue is tracked as CVE-2009-0029.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-05-18 13:33:24 -04:00
Linus Torvalds
3d9944978e Fix machine check recovery
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPtS5qAAoJEKurIx+X31iB+bAP/1FjUa2Nd53X89HFc6DoLktA
 4AshM/JENhAfSbpTfGhg10ZOuwUa8sQ85Sf6yz1CsW6mEiJK/bDFrR1g2KrmejyL
 owgQvV6ukPABzfB27tWyXSVVBPmLkJviedLDautVpgPEPVuqauntmpe7fW51b5pf
 92MxvYZ6AzgbDIjVaXP7e+kPomvgyM1C/UEvCgoyEcw81h5dchU9NSdXNBS67JS/
 uOsMiMJyoNI46haYYbyFgMq3RmpYuxTLFj7qFDlUltyjP+vIvyLs38Ae/vkRMNfV
 sYXWRUQlRpvqg4MDIFVZx8FWTufzTm0BMS+Be7JkXKWdF3DAksq6FprOWIxfYi+d
 PMxwTFeSJzTINb9n9MiLt3TmuRy3mu37QWd28qJaJciNMkYWbclPqyJmjwsuAMKg
 hKSy2FvewIDHTAGOkwaVjS+L8O7j3TNRIAbweFA1d1K4rt6oSfwdn6GZrAb6MTvx
 oV0Fe1nAyY9mucyjknBTim3RYZ9qQ7H9SjL8JoaGihSzi988MgBun+iEQOQiwjNl
 YJNGIv0wb31qCKFtxU0A8rA0sFdhQRoLgigJfETb5a+2ghtG323oH6ZVWTnB8bp/
 g1XOpus222/iJdL2AizMiJeUNV1IZ0SeLn2GoTFQIy9Uf11Zdgu5wLJumUva8EC6
 JAACbpBlYufftIn278OH
 =tk2M
 -----END PGP SIGNATURE-----

Merge tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull a machine check recovery fix from Tony Luck.

I really don't like how the MCE code does some of the things it does,
but this does seem to be an improvement.

* tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  x86/mce: Only restart instruction after machine check recovery if it is safe
2012-05-18 09:42:20 -07:00
Paul Gortmaker
93c2d656c7 frv: delete incorrect task prototypes causing compile fail
Commit 41101809a8 ("fork: Provide weak arch_release_[task_struct|
thread_info] functions") in -tip highlights a problem in the frv arch,
where it has needles prototypes for alloc_task_struct_node and
free_task_struct.  This now shows up as:

  kernel/fork.c:120:66: error: static declaration of 'alloc_task_struct_node' follows non-static declaration
  kernel/fork.c:127:51: error: static declaration of 'free_task_struct' follows non-static declaration

since that commit turned them into real functions.  Since arch/frv does
does not define define __HAVE_ARCH_TASK_STRUCT_ALLOCATOR (i.e.  it just
uses the generic ones) it shouldn't list these at all.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
majianpeng
02e1a9cd1e slub: missing test for partial pages flush work in flush_all()
I found some kernel messages such as:

    SLUB raid5-md127: kmem_cache_destroy called for cache that still has objects.
    Pid: 6143, comm: mdadm Tainted: G           O 3.4.0-rc6+        #75
    Call Trace:
    kmem_cache_destroy+0x328/0x400
    free_conf+0x2d/0xf0 [raid456]
    stop+0x41/0x60 [raid456]
    md_stop+0x1a/0x60 [md_mod]
    do_md_stop+0x74/0x470 [md_mod]
    md_ioctl+0xff/0x11f0 [md_mod]
    blkdev_ioctl+0xd8/0x7a0
    block_ioctl+0x3b/0x40
    do_vfs_ioctl+0x96/0x560
    sys_ioctl+0x91/0xa0
    system_call_fastpath+0x16/0x1b

Then using kmemleak I found these messages:

    unreferenced object 0xffff8800b6db7380 (size 112):
      comm "mdadm", pid 5783, jiffies 4294810749 (age 90.589s)
      hex dump (first 32 bytes):
        01 01 db b6 ad 4e ad de ff ff ff ff ff ff ff ff  .....N..........
        ff ff ff ff ff ff ff ff 98 40 4a 82 ff ff ff ff  .........@J.....
      backtrace:
        kmemleak_alloc+0x21/0x50
        kmem_cache_alloc+0xeb/0x1b0
        kmem_cache_open+0x2f1/0x430
        kmem_cache_create+0x158/0x320
        setup_conf+0x649/0x770 [raid456]
        run+0x68b/0x840 [raid456]
        md_run+0x529/0x940 [md_mod]
        do_md_run+0x18/0xc0 [md_mod]
        md_ioctl+0xba8/0x11f0 [md_mod]
        blkdev_ioctl+0xd8/0x7a0
        block_ioctl+0x3b/0x40
        do_vfs_ioctl+0x96/0x560
        sys_ioctl+0x91/0xa0
        system_call_fastpath+0x16/0x1b

This bug was introduced by commit a8364d5555 ("slub: only IPI CPUs that
have per cpu obj to flush"), which did not include checks for per cpu
partial pages being present on a cpu.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
Cyrill Gorcunov
eb94cd96e0 fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
map_files/ entries are never supposed to be executed, still curious
minds might try to run them, which leads to the following deadlock

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.4.0-rc4-24406-g841e6a6 #121 Not tainted
  -------------------------------------------------------
  bash/1556 is trying to acquire lock:
   (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1

  but task is already holding lock:
   (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (&sig->cred_guard_mutex){+.+.+.}:
         validate_chain+0x444/0x4f4
         __lock_acquire+0x387/0x3f8
         lock_acquire+0x12b/0x158
         __mutex_lock_common+0x56/0x3a9
         mutex_lock_killable_nested+0x40/0x45
         lock_trace+0x24/0x59
         proc_map_files_lookup+0x5a/0x165
         __lookup_hash+0x52/0x73
         do_lookup+0x276/0x2b1
         walk_component+0x3d/0x114
         do_last+0xfc/0x540
         path_openat+0xd3/0x306
         do_filp_open+0x3d/0x89
         do_sys_open+0x74/0x106
         sys_open+0x21/0x23
         tracesys+0xdd/0xe2

  -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
         check_prev_add+0x6a/0x1ef
         validate_chain+0x444/0x4f4
         __lock_acquire+0x387/0x3f8
         lock_acquire+0x12b/0x158
         __mutex_lock_common+0x56/0x3a9
         mutex_lock_nested+0x40/0x45
         do_lookup+0x267/0x2b1
         walk_component+0x3d/0x114
         link_path_walk+0x1f9/0x48f
         path_openat+0xb6/0x306
         do_filp_open+0x3d/0x89
         open_exec+0x25/0xa0
         do_execve_common+0xea/0x2f9
         do_execve+0x43/0x45
         sys_execve+0x43/0x5a
         stub_execve+0x6c/0xc0

This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
and when do_lookup happens we try to grab task->signal->cred_guard_mutex
again in lock_trace.

Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
and in proc_map_files_readdir() instead of lock_trace(), the caller must
be CAP_SYS_ADMIN granted anyway.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
Rajkumar Kasirajan
c0a5f4a05a drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
The reset date of the ST Micro version of PL031 is 2000-01-01.  The
correct weekday for 2000-01-01 is saturday, but pl031 is initialized to
sunday.  This may lead to alarm malfunction, so configure the correct
wday if RTC_DR indicates reset.

Signed-off-by: Rajkumar Kasirajan <rajkumar.kasirajan@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Mattias Wallin <mattias.wallin@stericsson.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
Linus Torvalds
42ea7d7f2a Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
 "Small set of fixes again."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path
  ARM: 7418/1: LPAE: fix access flag setup in mem_type_table
  ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS
  ARM: 7417/1: vfp: ensure preemption is disabled when enabling VFP access
2012-05-17 16:52:29 -07:00
Linus Torvalds
39c2028531 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull two networking fixes from David S. Miller:

1) Thanks to Willy Tarreau and Eric Dumazet, we've unlocked a bug that's
   been present in do_tcp_sendpages() since that function was written in
   2002.

   When we block to wait for memory we have to unconditionally try and
   push out pending TCP data, otherwise we can block for an unreasonably
   long amount of time.

2) Fix deadlock in e1000, fixes kernel bugzilla 43132

   From Tushar Dave.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  e1000: Prevent reset task killing itself.
  tcp: do_tcp_sendpages() must try to push data out on oom conditions
2012-05-17 16:30:26 -07:00
Rafael J. Wysocki
5c7dd710f6 ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold
Commit 1cc0c998fd ("ACPI: Fix D3hot v D3cold confusion") introduced a
bug in __acpi_bus_set_power() and changed the behavior of
acpi_pci_set_power_state() in such a way that it generally doesn't work
as expected if PCI_D3hot is passed to it as the second argument.

First off, if ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) is passed to
__acpi_bus_set_power() and the explicit_set flag is set for the D3cold
state, the function will try to execute AML method called "_PS4", which
doesn't exist.

Fix this by adding a check to ensure that the name of the AML method
to execute for transitions to ACPI_STATE_D3_COLD is correct in
__acpi_bus_set_power().  Also make sure that the explicit_set flag
for ACPI_STATE_D3_COLD will be set if _PS3 is present and modify
acpi_power_transition() to avoid accessing power resources for
ACPI_STATE_D3_COLD, because they don't exist.

Second, if PCI_D3hot is passed to acpi_pci_set_power_state() as the
target state, the function will request a transition to
ACPI_STATE_D3_HOT instead of ACPI_STATE_D3.  However,
ACPI_STATE_D3_HOT is now only marked as supported if the _PR3 AML
method is defined for the given device, which is rare.  This causes
problems to happen on systems where devices were successfully put
into ACPI D3 by pci_set_power_state(PCI_D3hot) which doesn't work
now.  In particular, some unused graphics adapters are not turned
off as a result.

To fix this issue restore the old behavior of
acpi_pci_set_power_state(), which is to request a transition to
ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) if either PCI_D3hot or
PCI_D3cold is passed to it as the argument.

This approach is not ideal, because generally power should not
be removed from devices if PCI_D3hot is the target power state,
but since this behavior is relied on, we have no choice but to
restore it at the moment and spend more time on designing a
better solution in the future.

References: https://bugzilla.kernel.org/show_bug.cgi?id=43228
Reported-by: rocko <rockorequin@hotmail.com>
Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Reported-and-tested-by: Peter <lekensteyn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 16:16:16 -07:00
Tushar Dave
8ce6909f77 e1000: Prevent reset task killing itself.
Killing reset task while adapter is resetting causes deadlock.
Only kill reset task if adapter is not resetting.
Ref bug #43132 on bugzilla.kernel.org

CC: stable@vger.kernel.org
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17 18:32:41 -04:00