After applied the commit (4a092d73), we have reduced the number of
source files that need to #include ext4_extents.h. But we can do
better.
This commit defines ext4_zeroout_es() in extents.c and move
EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
indirect.c and ioctl.c. Meanwhile we just need to include this file in
extent_status.c when ES_AGGRESSIVE_TEST is defined. Otherwise, this
commit removes a duplicated declaration in trace/events/ext4.h.
After applied this patch, we just need to include ext4_extents.h file
in {super,migrate,move_extents,extents}.c, and it is easy for us to
define a new extent disk layout.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Use wait_for_stable_page() instead of wait_on_page_writeback()
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
If ext_debugging is enabled and path[depth].p_ext is NULL, len
and lblock are printed non initialized
Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The following race can lead to a loss of i_disksize update from truncate
thus resulting in a wrong inode size if the inode size isn't updated
again before inode is reclaimed:
ext4_setattr() mpage_map_and_submit_extent()
EXT4_I(inode)->i_disksize = attr->ia_size;
... ...
disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT
/* False because i_size isn't
* updated yet */
if (disksize > i_size_read(inode))
/* True, because i_disksize is
* already truncated */
if (disksize > EXT4_I(inode)->i_disksize)
/* Overwrite i_disksize
* update from truncate */
ext4_update_i_disksize()
i_size_write(inode, attr->ia_size);
For other places updating i_disksize such race cannot happen because
i_mutex prevents these races. Writeback is the only place where we do
not hold i_mutex and we cannot grab it there because of lock ordering.
We fix the race by doing both i_disksize and i_size update in truncate
atomically under i_data_sem and in mpage_map_and_submit_extent() we move
the check against i_size under i_data_sem as well.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Merge conditions in ext4_setattr() handling inode size changes, also
move ext4_begin_ordered_truncate() call somewhat earlier because it
simplifies error recovery in case of failure. Also add error handling in
case i_disksize update fails.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Inode size can arbitrarily change while writeback is in progress. When
ext4_writepages() has prepared a long extent for mapping and truncate
then reduces i_size, mpage_map_and_submit_buffers() will always map just
one buffer in a page instead of all of them due to lblk < blocks check.
So we end up not using all blocks we've allocated (thus leaking them)
and also delalloc accounting goes wrong manifesting as a warning like:
ext4_da_release_space:1333: ext4_da_release_space: ino 12, to_free 1
with only 0 reserved data blocks
Note that the problem can happen only when blocksize < pagesize because
otherwise we have only a single buffer in the page.
Fix the problem by removing the size check from the mapping loop. We
have an extent allocated so we have to use it all before checking for
i_size. We also rename add_page_bufs_to_extent() to
mpage_process_page_bufs() and make that function submit the page for IO
if all buffers (upto EOF) in it are mapped.
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Zheng Liu <gnehzuil.liu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Currently the logic whether the current buffer can be added to an extent
of buffers to map is split between mpage_add_bh_to_extent() and
add_page_bufs_to_extent(). Move the whole logic to
mpage_add_bh_to_extent() which makes things a bit more straightforward
and make following i_size fixes easier.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
reaim workfile.dbase test easily triggers warning in
ext4_da_update_reserve_space():
EXT4-fs warning (device ram0): ext4_da_update_reserve_space:365:
ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1
blocks with reserved 9 data blocks)
The problem is that (one of) tests creates file and then randomly writes
to it with O_SYNC. That results in writing back pages of the file in
random order so we create extents for written blocks say 0, 2, 4, 6, 8
- this last allocation also allocates new block for extents. Then we
writeout block 1 so we have extents 0-2, 4, 6, 8 and we release
indirect extent block because extents fit in the inode again. Then we
writeout block 10 and we need to allocate indirect extent block again
which triggers the warning because we don't have the reservation
anymore.
Fix the problem by giving back freed metadata blocks resulting from
extent merging into inode's reservation pool.
Signed-off-by: Jan Kara <jack@suse.cz>
ext4 needs to convert allocated (metadata) blocks back into blocks
reserved for delayed allocation. Add functions into quota code for
supporting such operation.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In no journal mode, if an inode has recently been deleted, we
shouldn't reuse it right away. Otherwise it's possible, after an
unclean shutdown, to hit a situation where a recently deleted inode
gets reused for some other purpose before the inode table block has
been written to disk. However, if the directory entry has been
updated, then the directory entry will be pointing at the old inode
contents.
E2fsck will make sure the file system is consistent after the
unclean shutdown. However, if the recently deleted inode is a
character mode device, or an inode with the immutable bit set, even
after the file system has been fixed up by e2fsck, it can be
possible for a *.pyc file to be pointing at a character mode
device, and when python tries to open the *.pyc file, Hilarity
Ensues. We could change all of userspace to be very suspicious
about stat'ing files before opening them, and clearing the
immutable flag if necessary --- or we can just avoid reusing an
inode number if it has been recently deleted.
Google-Bug-Id: 10017573
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When ext4_rename() overwrites an already existing file, call
ext4_alloc_da_blocks() before starting the journal handle which
actually does the rename, instead of doing this afterwards. This
improves the likelihood that the contents will survive a crash if an
application replaces a file using the sequence:
1) write replacement contents to foo.new
2) <omit fsync of foo.new>
3) rename foo.new to foo
It is still not a guarantee, since ext4_alloc_da_blocks() is *not*
doing a file integrity sync; this means if foo.new is a very large
file, it may not be completely flushed out to disk.
However, for files smaller than a megabyte or so, any dirty pages
should be flushed out before we do the rename operation, and so at the
next journal commit, the CACHE FLUSH command will make sure al of
these pages are safely on the disk platter.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
In ext4_rename(), don't start the journal handle until the the
directory entries have been successfully looked up.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add a new fiemap flag which forces the all of the extents in an inode
to be cached in the extent_status tree. This is critically important
when using AIO to a preallocated file, since if we need to read in
blocks from the extent tree, the io_submit(2) system call becomes
synchronous, and the AIO is no longer "A", which is bad.
In addition, for most files which have an external leaf tree block,
the cost of caching the information in the extent status tree will be
less than caching the entire 4k block in the buffer cache. So it is
generally a win to keep the extent information cached.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When we read in an extent tree leaf block from disk, arrange to have
all of its entries cached. In nearly all cases the in-memory
representation will be more compact than the on-disk representation in
the buffer cache, and it allows us to get the information without
having to traverse the extent tree for successive extents.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Don't use an unsigned long long for the es_status flags; this requires
that we pass 64-bit values around which is painful on 32-bit systems.
Instead pass the extent status flags around using the low 4 bits of an
unsigned int, and shift them into place when we are reading or writing
es_pblk.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
When we find an invalid extent tree block, report the block number of
the bad block for debugging purposes.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Refactor out the code needed to read the extent tree block into a
single read_extent_tree_block() function. In addition to simplifying
the code, it also makes sure that we call the ext4_ext_load_extent
tracepoint whenever we need to read an extent tree block from disk.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Commit 0713ed0cde added
jbd2_journal_file_inode() call into ext4_block_zero_page_range().
However that function gets called from truncate path and thus inode
needn't have jinode attached - that happens in ext4_file_open() but
the file needn't be ever open since mount. Calling
jbd2_journal_file_inode() without jinode attached results in the oops.
We fix the problem by attaching jinode to inode also in ext4_truncate()
and ext4_punch_hole() when we are going to zero out partial blocks.
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When jbd2_journal_dirty_metadata() returns error,
__ext4_handle_dirty_metadata() stops the handle. However callers of this
function do not count with that fact and still happily used now freed
handle. This use after free can result in various issues but very likely
we oops soon.
The motivation of adding __ext4_journal_stop() into
__ext4_handle_dirty_metadata() in commit 9ea7a0df seems to be only to
improve error reporting. So replace __ext4_journal_stop() with
ext4_journal_abort_handle() which was there before that commit and add
WARN_ON_ONCE() to dump stack to provide useful information.
Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org # 3.2+
Previously we weren't swapping only some of the extent_status LRU
fields during the processing of the EXT4_IOC_SWAP_BOOT ioctl. The
much safer thing to do is to just completely flush the extent status
tree when doing the swap.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <gnehzuil.liu@gmail.com>
Cc: stable@vger.kernel.org
Commit 5688978 ("ext4: improve handling of conflicting mount options")
introduced incorrect messages shown while choosing wrong mount options.
First of all, both cases of incorrect mount options,
"data=journal,delalloc" and "data=journal,dioread_nolock" result in
the same error message.
Secondly, the problem above isn't solved for remount option: the
mismatched parameter is simply ignored. Moreover, ext4_msg states
that remount with options "data=journal,delalloc" succeeded, which is
not true.
To fix it up, I added a simple check after parse_options() call to
ensure that data=journal and delalloc/dioread_nolock parameters are
not present at the same time.
Signed-off-by: Piotr Sarna <p.sarna@partner.samsung.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Commit 26092bf ("ext4: use a table-driven handler for mount options")
wrongly disallows the specifying the mount options nodelalloc and
data=journal simultaneously. This is incorrect; it should have only
disallowed the combination of delalloc and data=journal
simultaneously.
Reported-by: Piotr Sarna <p.sarna@partner.samsung.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
In commit 921f266b: ext4: add self-testing infrastructure to do a
sanity check, some sanity checks were added in map_blocks to make sure
'retval == map->m_len'.
Enable these checks by default and report any assertion failures using
ext4_warning() and WARN_ON() since they can help us to figure out some
bugs that are otherwise hard to hit.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When we try to allocate an inode, and there is a race between two
CPU's trying to grab the same inode, _and_ this inode is the last free
inode in the block group, make sure the group number is bumped before
we continue searching the rest of the block groups. Otherwise, we end
up searching the current block group twice, and we end up skipping
searching the last block group. So in the unlikely situation where
almost all of the inodes are allocated, it's possible that we will
return ENOSPC even though there might be free inodes in that last
block group.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
- Change from Aaron Lu makes ACPICA export a variable which can be
used by driver code to determine whether or not the BIOS believes
that we are compatible with Windows 8.
- Change from Matthew Garrett makes the ACPI video driver initialize
the ACPI backlight even if it is not going to be used afterward
(that is needed for backlight control to work on Thinkpads).
- Fix from Rafael J Wysocki implements Windows 8 backlight support
workaround making i915 take over bakclight control if the firmware
thinks it's dealing with Windows 8. Based on the work of multiple
developers including Matthew Garrett, Chun-Yi Lee, Seth Forshee,
and Aaron Lu.
- Fix from Aaron Lu makes the kernel follow Windows 8 by informing
the firmware through the _DOS method that it should not carry out
automatic brightness changes, so that brightness can be controlled
by GUI.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJR6yY7AAoJEKhOf7ml8uNsbGUQAIVXwX8HF+9AOnqEIQYEaBiF
HqfLDtHS5qobraK06auF/YmVaA17RdUnHssTuGiEbtIxpiUbuLPJaecZ9BeAf0Pz
V4Y2IxF27aF9TDZrzkZXHcnYflzQ/kxj36eR9AmM2vSXmKZKxhfqLMeihVh2GgMv
IlOs9PltK2GNX6C/CzjUQuUj4TYw8yxXsG93Gta0Z8scmxR7mpq9a9d0cPU/TjN/
iatIhZLFU8ujp8xFiG9MDeG948AtperLu3g0v1D4mPnkmDJTuyMuE3FiioKL2zMY
7jh6mDPkWUYdjdZkJcmyzgKZh5lAlZIJTZnJV6TrW5fjIpUz5F8XeD4ArMVU/u+A
smro6XFcpgToRZTtmaEuraxzJHCS44FTjlXyH01FSIiN/Ll6YKyxDYsAzz4Z2sf6
X5BJofAAiRelZh/o1MaMQzs3QeTUo44TaboGr2zka0cQ37KK9+8YRGYqcWo/BvGs
sicgFKMeA6SANxHnCVNTslQzMYFrPp4yyMEu4gD7EE+U7cG6FSVhVHQjjTO9CmBd
ZnX2EDX0Uy+oHTQ9BkyjWAD7rXF6StnOm37FPWHNZ+HnplHEoQQAn+vXsSfl9tbO
7CPefZ/LQQEo1PZwLLkxruZ67NgxOd8/9I/aVjLUKgd8CDjez0UJVJ65gIXl1J4V
kvaDy6faYTnUF8h/AYJf
=vlwF
-----END PGP SIGNATURE-----
Merge tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI video support fixes from Rafael Wysocki:
"I'm sending a separate pull request for this as it may be somewhat
controversial. The breakage addressed here is not really new and the
fixes may not satisfy all users of the affected systems, but we've had
so much back and forth dance in this area over the last several weeks
that I think it's time to actually make some progress.
The source of the problem is that about a year ago we started to tell
BIOSes that we're compatible with Windows 8, which we really need to
do, because some systems shipping with Windows 8 are tested with it
and nothing else, so if we tell their BIOSes that we aren't compatible
with Windows 8, we expose our users to untested BIOS/AML code paths.
However, as it turns out, some Windows 8-specific AML code paths are
not tested either, because Windows 8 actually doesn't use the ACPI
methods containing them, so if we declare Windows 8 compatibility and
attempt to use those ACPI methods, things break. That occurs mostly
in the backlight support area where in particular the _BCM and _BQC
methods are plain unusable on some systems if the OS declares Windows
8 compatibility.
[ The additional twist is that they actually become usable if the OS
says it is not compatible with Windows 8, but that may cause
problems to show up elsewhere ]
Investigation carried out by Matthew Garrett indicates that what
Windows 8 does about backlight is to leave backlight control up to
individual graphics drivers. At least there's evidence that it does
that if the Intel graphics driver is used, so we've decided to follow
Windows 8 in that respect and allow i915 to control backlight (Daniel
likes that part).
The first commit from Aaron Lu makes ACPICA export the variable from
which we can infer whether or not the BIOS believes that we are
compatible with Windows 8.
The second commit from Matthew Garrett prepares the ACPI video driver
by making it initialize the ACPI backlight even if it is not going to
be used afterward (that is needed for backlight control to work on
Thinkpads).
The third commit implements the actual workaround making i915 take
over backlight control if the firmware thinks it's dealing with
Windows 8 and is based on the work of multiple developers, including
Matthew Garrett, Chun-Yi Lee, Seth Forshee, and Aaron Lu.
The final commit from Aaron Lu makes us follow Windows 8 by informing
the firmware through the _DOS method that it should not carry out
automatic brightness changes, so that brightness can be controlled by
GUI.
Hopefully, this approach will allow us to avoid using blacklists of
systems that should not declare Windows 8 compatibility just to avoid
backlight control problems in the future.
- Change from Aaron Lu makes ACPICA export a variable which can be
used by driver code to determine whether or not the BIOS believes
that we are compatible with Windows 8.
- Change from Matthew Garrett makes the ACPI video driver initialize
the ACPI backlight even if it is not going to be used afterward
(that is needed for backlight control to work on Thinkpads).
- Fix from Rafael J Wysocki implements Windows 8 backlight support
workaround making i915 take over bakclight control if the firmware
thinks it's dealing with Windows 8. Based on the work of multiple
developers including Matthew Garrett, Chun-Yi Lee, Seth Forshee,
and Aaron Lu.
- Fix from Aaron Lu makes the kernel follow Windows 8 by informing
the firmware through the _DOS method that it should not carry out
automatic brightness changes, so that brightness can be controlled
by GUI"
* tag 'acpi-video-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: no automatic brightness changes by win8-compatible firmware
ACPI / video / i915: No ACPI backlight if firmware expects Windows 8
ACPI / video: Always call acpi_video_init_brightness() on init
ACPICA: expose OSI version
support (along with a similar fix for ext3)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJR60efAAoJENNvdpvBGATwLnEQAMABvjWRdJ/u8mSVm/j5K17j
y8+PqdZslLBq7NEfcpQBfeajXC0ouQETVWs6HhE9LeZcfgSQdqkR7/2fPJp/QJ7h
eY/DUykNavuLVfWgvAkK3DyJz6WK+m2H8sBlqTblqVDsrVPtGytPxpPiGmzbcRlU
9aKoJcGkR7YtYWy6lY/tPDHUYByHE54LWER/fSdrLrOu0F5h7bLTGImZclLRM8wO
kWmjXjDVsgpBNGpn/6eNSaqWGXnECbPcE+b+DPNKEYhU0WnzvgPY0Xw2fV2E9+IK
0eEp7Z248u+mxW5Na9V8SZ8EpSGTQvm7hpqHnDPQGwGNn89j4w9n7qX8IoeUKUf0
0Rk72FHBXeDh3juaQekytrzXlHescpiNwYNlDQNmfezsDotHvq5h+29GGoI+Cv8X
Tt5JoBsexoUJzHjRReAegTjR0B+MTwqmGSx7KH5Rm02KKzWjKXzKcUsefztEktDw
Z4E8cKMG+rUGBgCGckjiBX00OMjlmph8PomXligCY9aaI2f5F1ZjXgnaYpCDq6ez
xU16lxvhI9eWzAnn7kIA1I/6q2DRzLxjaTKYsqFm0Jf9sLILumPpusGd2sgq7BBD
NAo8nC5Nuhq5EB4Ik0KQ/F+BUAkKRI6UB5zkC1sfmZFB6AfheO4HQ2Qp5PWafCJ5
P6h0IDbrVZnF3c+4hW/m
=CV0e
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext[34] tmpfile bugfix from Ted Ts'o:
"Fix regression caused by commit af51a2ac36 which added ->tmpfile()
support (along with a similar fix for ext3)"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext3: fix a BUG when opening a file with O_TMPFILE flag
ext4: fix a BUG when opening a file with O_TMPFILE flag
Here are a few iio driver fixes for 3.11-rc2. They are still spread
across drivers/iio and drivers/staging/iio so they are coming in through
this tree.
I've also removed the drivers/staging/csr/ driver as the developers who
originally sent it to me have moved on to other companies, and CSR still
will not send us the specs for the device, making the driver pretty much
obsolete and impossible to fix up. Deleting it now prevents people from
sending in lots of tiny codingsyle fixes that will never go anywhere.
It also helps to offset the large lustre filesystem merge that happened
in 3.11-rc1 in the overall 3.11.0 diffstat. :)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
iEYEABECAAYFAlHq64AACgkQMUfUDdst+ynysQCgwUM8pbZ7iJGDJAW2TahKwVie
f5MAnRR8DokyE7iWwXDEx5UYVPerMrpq
=9ORA
-----END PGP SIGNATURE-----
Merge tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging tree fixes from Greg KH:
"Here are a few iio driver fixes for 3.11-rc2. They are still spread
across drivers/iio and drivers/staging/iio so they are coming in
through this tree.
I've also removed the drivers/staging/csr/ driver as the developers
who originally sent it to me have moved on to other companies, and CSR
still will not send us the specs for the device, making the driver
pretty much obsolete and impossible to fix up. Deleting it now
prevents people from sending in lots of tiny codingsyle fixes that
will never go anywhere.
It also helps to offset the large lustre filesystem merge that
happened in 3.11-rc1 in the overall 3.11.0 diffstat. :)"
* tag 'staging-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: csr: remove driver
iio: lps331ap: Fix wrong in_pressure_scale output value
iio staging: fix lis3l02dq, read error handling
staging:iio:ad7291: add missing .driver_module to struct iio_info
iio: ti_am335x_adc: add missing .driver_module to struct iio_info
iio: mxs-lradc: Remove useless check in read_raw
iio: mxs-lradc: Fix misuse of iio->trig
iio: inkern: fix iio_convert_raw_to_processed_unlocked
iio: Fix iio_channel_has_info
iio:trigger: device_unregister->device_del to avoid double free
iio: dac: ad7303: fix error return code in ad7303_probe()
Pull vfs fixes from Al Viro:
"The sget() one is a long-standing bug and will need to go into -stable
(in fact, it had been originally caught in RHEL6), the other two are
3.11-only"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: constify dentry parameter in d_count()
livelock avoidance in sget()
allow O_TMPFILE to work with O_WRONLY
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJR6cmlAAoJENNvdpvBGATwZF0P/0a7ET511UJwQbgAIq5ftFlj
86Bzvy28xo2T85t64L+Ib2XDehWHk0sZlQpB/gK8MLYn4rCRWCxkQAshKwoequsC
AhuvQ7NtX9vJNCSR30+RrLhkvj6UKsMuM724adARLBUgMBoScABzZImR1e14ELah
bN27a4Bk2aNUpNX68QYdQX3TGiHGZy//lNmh81JTxFS3Moqm6bIZAJbYpOslATsI
Q5nti/TjQJKso2gF7Jx7NffXv0g5rGxaVQEZJPpfIv1Vs0b6vabK/sYp608ayM0K
qKyjJABaHR1Pzb16V82ZqvSlsHm/ARhCF1nMM6gQ8nwl/plxcQ6Jvd/qJsNej3b/
7Jfm86xLe+G0G5oeNEJXsoEFAsvxug6ZRMfyoRHaPlGIksmz+Jc9kzTtM3qzdzOB
5OPJwlONlM4dRVA6rgb7KiuE3h/sRt4CctFejD0f6mUqKa+B+zyHq/a/8a+60IqQ
/sDiTQrqrI6LWxECFasDNoGxtnvVtKC21jbg+MTzumZDvjgnJIFFe5NrinI6SB9x
VQYVq/vVkE576VTwGAttTg3s4sRwQKd/iuQjuoP76iFFHvq/sNX6fBq0NW5gpsj2
WAfH+fLQsMcVJ2MAcc3DwdBT1wQbLu+Y19hv4TDOZRmnKGhq9K08hzWR4tIUKdFJ
UcjWk35Wuoz1IGpVlHJ5
=ngfz
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"Fixes for 3.11-rc2, sent at 5pm, in the professoinal style. :-)"
I'm not sure I like this new level of "professionalism".
9-5, people, 9-5.
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: call ext4_es_lru_add() after handling cache miss
ext4: yield during large unlinks
ext4: make the extent_status code more robust against ENOMEM failures
ext4: simplify calculation of blocks to free on error
ext4: fix error handling in ext4_ext_truncate()
- Fix a regression against NFSv4 FreeBSD servers when creating a new file
- Fix another regression in rpc_client_register()
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR6ZKMAAoJEGcL54qWCgDyQX8P/19LKLNKcL+y2zVGjLbXMTq0
TpyWdBO0ux7QcqnPEDg+Jpvu62IowYiKTtaSOXtHb5BNjQMBo2RKw3B0eMBoCp/z
6gHmQRD2hMgqwBxBwHceV+dNwueCUiZW7GqaaNh6/3bpGQefegdONnLEifuPogEu
oZmEuiVrGDfITEF7D4k5+shXCQN4eNH0LFuIQo4XXdCqmK6PwvOsidZ7YwHVC3Mg
/Jzda2YsCxHj8kPi1xb9skPPAn6g4kdfYfyr/xSY7IviPixrkg/nEEK1b8xHU81e
a0dd0Yx5kq6fR8LsBvQCHdj2m7doHM15jf5Np5G7VnnaWEjB2y+QftkxWc9lCNU3
t2fr9YVD7ZG/GGNSFePHAHmBY0OqDB1Htp4vcwEQfzX6CAR3Hel82WVvut62Z6m4
G5qHjwdqUFhmRN//SWlDpEqSn+pbeCvPhQS60ayN0TLivRsscm/I4yA75odAnn9b
4su1IcUpqeJGeV6yDyMUqbx4kYZFyCZg/DNkThXiTKOs47A7ogSS9ev2fTB/V+jd
rroNHNd/U508ze9D6D4ai9vR78uUp4wKNSSBZMCkBtNh0uSApOTgyGVhertB1EKS
vgAr4T1tc+9t+0qg1Sb+hbKyBM/KaS5zUrPn+APHPoBXPh5PSVBzeNJkpxHRw/V0
ZxkEgSQKLZSXYb5ab770
=XE+7
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix a regression against NFSv4 FreeBSD servers when creating a new
file
- Fix another regression in rpc_client_register()
* tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix a regression against the FreeBSD server
SUNRPC: Fix another issue with rpc_client_register()
Pull btrfs fixes from Josef Bacik:
"I'm playing the role of Chris Mason this week while he's on vacation.
There are a few critical fixes for btrfs here, all regressions and
have been tested well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next:
Btrfs: fix wrong write offset when replacing a device
Btrfs: re-add root to dead root list if we stop dropping it
Btrfs: fix lock leak when resuming snapshot deletion
Btrfs: update drop progress before stopping snapshot dropping
so that it can be used in places like d_compare/d_hash
without causing a compiler warning.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Eric Sandeen has found a nasty livelock in sget() - take a mount(2) about
to fail. The superblock is on ->fs_supers, ->s_umount is held exclusive,
->s_active is 1. Along comes two more processes, trying to mount the same
thing; sget() in each is picking that superblock, bumping ->s_count and
trying to grab ->s_umount. ->s_active is 3 now. Original mount(2)
finally gets to deactivate_locked_super() on failure; ->s_active is 2,
superblock is still ->fs_supers because shutdown will *not* happen until
->s_active hits 0. ->s_umount is dropped and now we have two processes
chasing each other:
s_active = 2, A acquired ->s_umount, B blocked
A sees that the damn thing is stillborn, does deactivate_locked_super()
s_active = 1, A drops ->s_umount, B gets it
A restarts the search and finds the same superblock. And bumps it ->s_active.
s_active = 2, B holds ->s_umount, A blocked on trying to get it
... and we are in the earlier situation with A and B switched places.
The root cause, of course, is that ->s_active should not grow until we'd
got MS_BORN. Then failing ->mount() will have deactivate_locked_super()
shut the damn thing down. Fortunately, it's easy to do - the key point
is that grab_super() is called only for superblocks currently on ->fs_supers,
so it can bump ->s_count and grab ->s_umount first, then check MS_BORN and
bump ->s_active; we must never increment ->s_count for superblocks past
->kill_sb(), but grab_super() is never called for those.
The bug is pretty old; we would've caught it by now, if not for accidental
exclusion between sget() for block filesystems; the things like cgroup or
e.g. mtd-based filesystems don't have anything of that sort, so they get
bitten. The right way to deal with that is obviously to fix sget()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull UML fixes from Richard Weinberger:
"Special thanks goes to Toralf Föster for continuously testing UML and
reporting issues!"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove dead code
um: siginfo cleanup
uml: Fix which_tmpdir failure when /dev/shm is a symlink, and in other edge cases
um: Fix wait_stub_done() error handling
um: Mark stub pages mapping with VM_PFNMAP
um: Fix return value of strnlen_user()
Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for 3.11. Half of then is for Netlogic the remainder
touches things across arch/mips.
Nothing really dramatic and by rc1 standards MIPS will be in fairly
good shape with this applied. Tested by building all MIPS defconfigs
of which with this pull request four platforms won't build. And yes,
it boots also on my favorite test systems"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: kvm: Kconfig: Drop HAVE_KVM dependency from VIRTUALIZATION
MIPS: Octeon: Fix DT pruning bug with pip ports
MIPS: KVM: Mark KVM_GUEST (T&E KVM) as BROKEN_ON_SMP
MIPS: tlbex: fix broken build in v3.11-rc1
MIPS: Netlogic: Add XLP PIC irqdomain
MIPS: Netlogic: Fix USB block's coherent DMA mask
MIPS: tlbex: Fix typo in r3000 tlb store handler
MIPS: BMIPS: Fix thinko to release slave TP from reset
MIPS: Delete dead invocation of exception_exit().
- Fixes (user cache maintenance fault handling, !COMPAT compilation, CPU
online and interrupt hanlding).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iQIcBAABAgAGBQJR6WGUAAoJEGvWsS0AyF7xaMQP/3WSpcEUQ+k8wCbjzkbpnQJ3
Z0Ufz0XeBUgmaZNwYFjnpVTm/R04F1gcsoA5qE//6iMkbcpbM2sWVN/uQvY3l+fQ
haJmCWZ7PIXxm3vbKrcSBiJ+WKpvUkzlL+Q1upmQWFCxkP6noejCsvazM5lPgt1q
1Vr5/9q87TOfxG8Udki3pPqRazd4YwQJx6JCZ46P+mlqQk9gxDoQ7RRXy9Viv+SV
6Jq5Lt8Qj7iXmq8DDTYdF7DHE7rnIhdCUhJu7UN1G/O7U4s9nlUbv55fHMY76ZAn
BddgNtIsMsvfIxlYZ6f0r2ccrgPHaDTLVW/q0VkwdP8jD2EQWAK7gI21OgQm0ebl
OL+t29zdx9nidpI+cCTlwejh8i7vRsoxgYki5qfnYR3SHL5HhQSHrUTZvEFr+u4I
ceXnDmTZ46HmPSCC6/5cFiXxsw1zbBxSB7rNFvXmF2Jr7F3TvAxCWvrIfmrmYdrC
bw4UMBB15SaJud3maqVGhj6aVo4bEand4g++Dk0ytXbGq+5Ke5CktmPOmzNLVu9p
R8tHDyp9szFP0eqqF1/XTZx2rsHvop7D1QUpQVgFC+mRWI3gjtFPtShW1GgbQQ+P
t7gvHJQ2SF1BRTSf6KH/cqXDgHzpg3VuP2moM7YbR0/qvRf7Eh/M3Hrg/s6xCeKr
ywmW7/+McIz3kDsk1gUM
=M4kj
-----END PGP SIGNATURE-----
Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64
Pull arm64 fixes from Catalin Marinas:
- Post -rc1 update to the common reboot infrastructure.
- Fixes (user cache maintenance fault handling, !COMPAT compilation,
CPU online and interrupt hanlding).
* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
arm64: use common reboot infrastructure
arm64: mm: don't treat user cache maintenance faults as writes
arm64: add '#ifdef CONFIG_COMPAT' for aarch32_break_handler()
arm64: Only enable local interrupts after the CPU is marked online
Pull s390 fixes from Martin Schwidefsky:
"An update for the BFP jit to the latest and greatest, two patches to
get kdump working again, the random-abort ptrace extention for
transactional execution, the z90crypt module alias for ap and a tiny
cleanup"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Alias for new zcrypt device driver base module
s390/kdump: Allow copy_oldmem_page() copy to virtual memory
s390/kdump: Disable mmap for s390
s390/bpf,jit: add pkt_type support
s390/bpf,jit: address randomize and write protect jit code
s390/bpf,jit: use generic jit dumper
s390/bpf,jit: call module_free() from any context
s390/qdio: remove unused variable
s390/ptrace: PTRACE_TE_ABORT_RAND
Miao Xie reported the following issue:
The filesystem was corrupted after we did a device replace.
Steps to reproduce:
# mkfs.btrfs -f -m single -d raid10 <device0>..<device3>
# mount <device0> <mnt>
# btrfs replace start -rfB 1 <device4> <mnt>
# umount <mnt>
# btrfsck <device4>
The reason for the issue is that we changed the write offset by mistake,
introduced by commit 625f1c8dc.
We read the data from the source device at first, and then write the
data into the corresponding place of the new device. In order to
implement the "-r" option, the source location is remapped using
btrfs_map_block(). The read takes place on the mapped location, and
the write needs to take place on the unmapped location. Currently
the write is using the mapped location, and this commit changes it
back by undoing the change to the write address that the aforementioned
commit added by mistake.
Reported-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
If we stop dropping a root for whatever reason we need to add it back to the
dead root list so that we will re-start the dropping next transaction commit.
The other case this happens is if we recover a drop because we will add a root
without adding it to the fs radix tree, so we can leak it's root and commit root
extent buffer, adding this to the dead root list makes this cleanup happen.
Thanks,
Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
We aren't setting path->locks[level] when we resume a snapshot deletion which
means we won't unlock the buffer when we free the path. This causes deadlocks
if we happen to re-allocate the block before we've evicted the extent buffer
from cache. Thanks,
Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Alex pointed out a problem and fix that exists in the drop one snapshot at a
time patch. If we decide we need to exit for whatever reason (umount for
example) we will just exit the snapshot dropping without updating the drop
progress. So the next time we go to resume we will BUG_ON() because we can't
find the extent we left off at because we never updated it. This patch fixes
the problem.
Cc: stable@vger.kernel.org
Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR6Wm5AAoJEBvWZb6bTYbykGQP/Rg4fF+p7v8eko9dNKJWAqfq
/RQ9mSIRKpwD+kZogPtrBNH/LKwSqaJCtciSp57d+cD9w/9S1pN7eu1nDwSl2RA6
EyOVJ/Pd4JCdaRj+yGA3cjs7tEN3SvDk1BHHAxq830ZS5ekuXRw2QC+gBC2hI/Zq
g442m2ELckcdQXwFcj+FuK4usxhkOL7MT5L64lBF962bAY87L7VL0XkFq9RZCp1B
mJJd5uAok7gNl4LHH/Gu5DbuQTvY33ijJiGl9lT0CXwP9bmZ0WTA9lR2poy8P7eM
R6qBuXaHN4qBm7tdRqPBsQ+S1+yr+JHRTCzYU7UsiXWg/fxu1QX2L77Q8ZbjnhNW
rnKOSVlc3j11b594ygX9uc5Elgr2cl9OBBFdoDDormBo9A3Kp9pGkjbJnl1tunPi
oA4Otx8GwxNsnL7aBWXEoPm70peQFDOAKNK+5p/PBasXsJrGCSqCc2x2+RpaveqR
HhE2z0eMaCpC8ayPofMsP0nIu/Cf53m2Fr6BHZqea2KBz57WAhXOuSe4vzXm3nXg
xBHBpPoNtBzTYjmWJ7vdlLGgzsMkZTOsRmppQT7wwrwGEjFCOV4kwHWOBsxNS5Z8
4eWyRFPLJP4dGRj2FoMZNfMM7/6XpQz5JttzxHXWwpua+AnQ4+ay4A/6FxiNMm0a
hRB9ev+0X2/9Qj3+GFIU
=+0Dk
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fix from Paolo Bonzini:
"This single patch fixes a regression caused by one of the
optimizations introduced in 3.11, which is generally visible only on
AMD processors"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: avoid fast page fault fixing mmio page fault
- Two cpufreq commits from the 3.10 cycle introduced regressions.
The first of them was buggy (it did way much more than it needed
to do) and the second one attempted to fix an issue introduced by
the first one. Fixes from Srivatsa S Bhat revert both.
- If autosleep triggers during system shutdown and the shutdown
callbacks of some device drivers have been called already, it may
crash the system. Fix from Liu Shuo prevents that from happening
by making try_to_suspend() check system_state.
- The ACPI memory hotplug driver doesn't clear its driver_data on
errors which may cause a NULL poiter dereference to happen later.
Fix from Toshi Kani.
- The ACPI namespace scanning code should not try to attach scan
handlers to device objects that have them already, which may confuse
things quite a bit, and it should rescan the whole namespace branch
starting at the given node after receiving a bus check notify event
even if the device at that particular node has been discovered
already. Fixes from Rafael J Wysocki.
- New ACPI video blacklist entry for a system whose initial backlight
setting from the BIOS doesn't make sense. From Lan Tianyu.
- Garbage string output avoindance for ACPI PNP from Liu Shuo.
- Two Kconfig fixes for issues introduced recently in the s3c24xx
cpufreq driver (when moving the driver to drivers/cpufreq) from
Paul Bolle.
- Trivial comment fix in pm_wakeup.h from Chanwoo Choi.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJR6Sq+AAoJEKhOf7ml8uNsrGQP/0HRDW+QmTGM8znDTHXngbn9
X3pqlpjEOiCtcmJaSJlD7GwLHMscwWcHKEezteaZ7KUI4mcnysJX6EY5YVbNriDC
xlLcDQn9c6Xx1maCSfp+WMygvqItxZwuc8veRjrT3XtOfCNWS/FlX40Voh63BCAe
GbfQ/HesmUg5CKplyD8/XypLWh5OFXmHzCe8IhrKGfhsZukXdSgSBjwQZMRrEMsQ
kJjDCF8zUu0JisiWqL+xE6IFSKme9i6LBlHpzU0Y1g4RqAqkIbuS0Z3vezOYzoTD
oZjBNa9XAgCS3x0l5g3G0ChgDAU+Mpji/imXA7JysrwbirGFbtPHtQYh2HzpAtnw
Hkah/0ocBM7/w7VTsUQiRsFPdIJTCBLlm6J38x8yh7n84h4nJgOpK69dBLrMwCUZ
f3kid6KIPVLBvnC3QSULrCAKUcUcVVWYtNho+sfXBMjP+cPwTmc3DvATnpru6twa
0KjR5o585UOcciq7EWAoMrCFCfZYF5C4XGaZAxHI/SWooxeCQH84S8vfNLL2epVC
ixmLYo4X2ANDsnfbUV+ewhB0/L2905Et6NhPUgPD/1rm15MEZbowbB2K0pzr0QL9
/1hTL61InXx3jLxducJJFKN+HZ0zfDQdTkyafKrR9jb+GsdmnzYJ/vnfDG8MfPjp
GZ281YBqVmUeYJh5CPU+
=IUmn
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes collected over the last week, most importnatly two
cpufreq reverts fixing regressions introduced in 3.10, an autoseelp
fix preventing systems using it from crashing during shutdown and two
ACPI scan fixes related to hotplug.
Specifics:
- Two cpufreq commits from the 3.10 cycle introduced regressions.
The first of them was buggy (it did way much more than it needed to
do) and the second one attempted to fix an issue introduced by the
first one. Fixes from Srivatsa S Bhat revert both.
- If autosleep triggers during system shutdown and the shutdown
callbacks of some device drivers have been called already, it may
crash the system. Fix from Liu Shuo prevents that from happening
by making try_to_suspend() check system_state.
- The ACPI memory hotplug driver doesn't clear its driver_data on
errors which may cause a NULL poiter dereference to happen later.
Fix from Toshi Kani.
- The ACPI namespace scanning code should not try to attach scan
handlers to device objects that have them already, which may
confuse things quite a bit, and it should rescan the whole
namespace branch starting at the given node after receiving a bus
check notify event even if the device at that particular node has
been discovered already. Fixes from Rafael J Wysocki.
- New ACPI video blacklist entry for a system whose initial backlight
setting from the BIOS doesn't make sense. From Lan Tianyu.
- Garbage string output avoindance for ACPI PNP from Liu Shuo.
- Two Kconfig fixes for issues introduced recently in the s3c24xx
cpufreq driver (when moving the driver to drivers/cpufreq) from
Paul Bolle.
- Trivial comment fix in pm_wakeup.h from Chanwoo Choi"
* tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
PNP / ACPI: avoid garbage in resource name
cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression
cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig
cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
PM / Sleep: Fix comment typo in pm_wakeup.h
PM / Sleep: avoid 'autosleep' in shutdown progress
cpufreq: Revert commit a66b2e to fix suspend/resume regression
ACPI / memhotplug: Fix a stale pointer in error path
ACPI / scan: Always call acpi_bus_scan() for bus check notifications
ACPI / scan: Do not try to attach scan handlers to devices having them