Commit Graph

32839 Commits

Author SHA1 Message Date
Filipe David Borba Manana
633085c79c Btrfs: reset force_compress on btrfs_file_defrag failure
After we set force_compress with a new value (which was not being done
while holding the inode mutex), if an error happens and we jump to
the label out_ra, the force_compress property of the inode is not set
to BTRFS_COMPRESS_NONE (unlike in the case where no errors happen).

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:00 -04:00
Stefan Behrens
f420ee1e92 Btrfs: add mount option to force UUID tree checking
This should never be needed, but since all functions are there
to check and rebuild the UUID tree, a mount option is added that
allows to force this check and rebuild procedure.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:59 -04:00
Stefan Behrens
70f8017547 Btrfs: check UUID tree during mount if required
If the filesystem was mounted with an old kernel that was not
aware of the UUID tree, this is detected by looking at the
uuid_tree_generation field of the superblock (similar to how
the free space cache is doing it). If a mismatch is detected
at mount time, a thread is started that does two things:
1. Iterate through the UUID tree, check each entry, delete those
   entries that are not valid anymore (i.e., the subvol does not
   exist anymore or the value changed).
2. Iterate through the root tree, for each found subvolume, add
   the UUID tree entries for the subvolume (if they are not
   already there).

This mechanism is also used to handle and repair errors that
happened during the initial creation and filling of the tree.
The update of the uuid_tree_generation field (which indicates
that the state of the UUID tree is up to date) is blocked until
all create and repair operations are successfully completed.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:58 -04:00
Stefan Behrens
26432799c9 Btrfs: introduce uuid-tree-gen field
In order to be able to detect the case that a filesystem is mounted
with an old kernel, add a uuid-tree-gen field like the free space
cache is doing it. It is part of the super block and written with
each commit. Old kernels do not know this field and don't update it.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:57 -04:00
Stefan Behrens
803b2f54fb Btrfs: fill UUID tree initially
When the UUID tree is initially created, a task is spawned that
walks through the root tree. For each found subvolume root_item,
the uuid and received_uuid entries in the UUID tree are added.
This is such a quick operation so that in case somebody wants
to unmount the filesystem while the task is still running, the
unmount is delayed until the UUID tree building task is finished.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:56 -04:00
Stefan Behrens
dd5f9615fc Btrfs: maintain subvolume items in the UUID tree
When a new subvolume or snapshot is created, a new UUID item is added
to the UUID tree. Such items are removed when the subvolume is deleted.
The ioctl to set the received subvolume UUID is also touched and will
now also add this received UUID into the UUID tree together with the
ID of the subvolume. The latter is also done when read-only snapshots
are created which inherit all the send/receive information from the
parent subvolume.

User mode programs use the BTRFS_IOC_TREE_SEARCH ioctl to search and
read in the UUID tree.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:55 -04:00
Stefan Behrens
f7a81ea4cc Btrfs: create UUID tree if required
This tree is not created by mkfs.btrfs. Therefore when a filesystem
is mounted writable and the UUID tree does not exist, this tree is
created if required. The tree is also added to the fs_info structure
and initialized, but this commit does not yet read or write UUID tree
elements.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:54 -04:00
Stefan Behrens
8f8ae8e213 Btrfs: support printing UUID tree elements
This commit adds support to print UUID tree elements to print-tree.c.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:53 -04:00
Stefan Behrens
07b30a49da Btrfs: introduce a tree for items that map UUIDs to something
Mapping UUIDs to subvolume IDs is an operation with a high effort
today. Today, the algorithm even has quadratic effort (based on the
number of existing subvolumes), which means, that it takes minutes
to send/receive a single subvolume if 10,000 subvolumes exist. But
even linear effort would be too much since it is a waste. And these
data structures to allow mapping UUIDs to subvolume IDs are created
every time a btrfs send/receive instance is started.

It is much more efficient to maintain a searchable persistent data
structure in the filesystem, one that is updated whenever a
subvolume/snapshot is created and deleted, and when the received
subvolume UUID is set by the btrfs-receive tool.

Therefore kernel code is added with this commit that is able to
maintain data structures in the filesystem that allow to quickly
search for a given UUID and to retrieve data that is assigned to
this UUID, like which subvolume ID is related to this UUID.

This commit adds a new tree to hold UUID-to-data mapping items. The
key of the items is the full UUID plus the key type BTRFS_UUID_KEY.
Multiple data blocks can be stored for a given UUID, a type/length/
value scheme is used.

Now follows the lengthy justification, why a new tree was added
instead of using the existing root tree:

The first approach was to not create another tree that holds UUID
items. Instead, the items should just go into the top root tree.
Unfortunately this confused the algorithm to assign the objectid
of subvolumes and snapshots. The reason is that
btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for
the first created subvol or snapshot after mounting a filesystem,
and this function simply searches for the largest used objectid in
the root tree keys to pick the next objectid to assign. Of course,
the UUID keys have always been the ones with the highest offset
value, and the next assigned subvol ID was wastefully huge.

To use any other existing tree did not look proper. To apply a
workaround such as setting the objectid to zero in the UUID item
key and to implement collision handling would either add
limitations (in case of a btrfs_extend_item() approach to handle
the collisions) or a lot of complexity and source code (in case a
key would be looked up that is free of collisions). Adding new code
that introduces limitations is not good, and adding code that is
complex and lengthy for no good reason is also not good. That's the
justification why a completely new tree was introduced.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:52 -04:00
Sergei Trofimovich
171170c1c5 btrfs: mark some local function as 'static'
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:51 -04:00
Stefan Behrens
35a3621beb Btrfs: get rid of sparse warnings
make C=2 fs/btrfs/ CF=-D__CHECK_ENDIAN__

I tried to filter out the warnings for which patches have already
been sent to the mailing list, pending for inclusion in btrfs-next.

All these changes should be obviously safe.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:50 -04:00
Filipe David Borba Manana
18674c6cc1 Btrfs: don't miss inode ref items in BTRFS_IOC_INO_LOOKUP
If the inode ref key was not found and the current leaf slot
was 0 (first item in the leaf) the code would always return
-ENOENT. This was not correct because the desired inode ref
item might be the last item in the previous leaf.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:49 -04:00
Filipe David Borba Manana
a696cf3529 Btrfs: add missing error code to BTRFS_IOC_INO_LOOKUP handler
If the path doesn't fit in the input buffer, return ENAMETOOLONG
instead of returning with a success code (0) and a partially
filled and right justified buffer.

Also removed useless buffer pointer check outside the while loop.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:48 -04:00
Wang Shilong
b006b2e4f9 Btrfs: remove reduplicate check when disabling quota
We have checked 'quota_root' with qgroup_ioctl_lock held before,So
here the check is reduplicate, remove it.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:47 -04:00
Wang Shilong
e685da14af Btrfs: move btrfs_free_qgroup_config() out of spin_lock and fix comments
btrfs_free_qgroup_config() is not only called by open/close_ctree(),but
also btrfs_disable_quota().And for btrfs_disable_quota(),we have set
'quota_root' to be null before calling btrfs_free_qgroup_config(),so it
is safe to cleanup in-memory structures without lock held.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:46 -04:00
Wang Shilong
4082bd3d73 Btrfs: fix oops when writing dirty qgroups to disk
When disabling quota, we should clear out list 'dirty_qgroups',otherwise,
we will get oops if enabling quota again. Fix this by abstracting similar
code from del_qgroup_rb().

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:45 -04:00
Josef Bacik
ba5e8f2e2d Btrfs: fix send issues related to inode number reuse
If you are sending a snapshot and specifying a parent snapshot we will walk the
trees and figure out where they differ and send the differences only.  The way
we check for differences are if the leaves aren't the same and if the keys are
not the same within the leaves.  So if neither leaf is the same (ie the leaf has
been cow'ed from the parent snapshot) we walk each item in the send root and
check it against the parent root.  If the items match exactly then we don't do
anything.  This doesn't quite work for inode refs, since they will just have the
name and the parent objectid.  If you move the file from a directory and then
remove that directory and re-create a directory with the same inode number as
the old directory and then move that file back into that directory we will
assume that nothing changed and you will get errors when you try to receive.

In order to fix this we need to do extra checking to see if the inode ref really
is the same or not.  So do this by passing down BTRFS_COMPARE_TREE_SAME if the
items match.  Then if the key type is an inode ref we can do some extra
checking, otherwise we just keep processing.  The extra checking is to look up
the generation of the directory in the parent volume and compare it to the
generation of the send volume.  If they match then they are the same directory
and we are good to go.  If they don't we have to add them to the changed refs
list.

This means we have to track the generation of the ref we're trying to lookup
when we iterate all the refs for a particular inode.  So in the case of looking
for new refs we have to get the generation from the parent volume, and in the
case of looking for deleted refs we have to get the generation from the send
volume to compare with.

There was also the issue of using a ulist to keep track of the directories we
needed to check.  Because we can get a deleted ref and a new ref for the same
inode number the ulist won't work since it indexes based on the value.  So
instead just dup any directory ref we find and add it to a local list, and then
process that list as normal and do away with using a ulist for this altogether.

Before we would fail all of the tests in the far-progs that related to moving
directories (test group 32).  With this patch we now pass these tests, and all
of the tests in the far-progs send testing suite.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:44 -04:00
Josef Bacik
dc11dd5d70 Btrfs: separate out tests into their own directory
The plan is to have a bunch of unit tests that run when btrfs is loaded when you
build with the appropriate config option.  My ultimate goal is to have a test
for every non-static function we have, but at first I'm going to focus on the
things that cause us the most problems.  To start out with this just adds a
tests/ directory and moves the existing free space cache tests into that
directory and sets up all of the infrastructure.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:38 -04:00
Josef Bacik
00361589d2 Btrfs: avoid starting a transaction in the write path
I noticed while looking at a deadlock that we are always starting a transaction
in cow_file_range().  This isn't really needed since we only need a transaction
if we are doing an inline extent, or if the allocator needs to allocate a chunk.
So push down all the transaction start stuff to be closer to where we actually
need a transaction in all of these cases.  This will hopefully reduce our write
latency when we are committing often.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:05:05 -04:00
Josef Bacik
9ffba8cda9 Btrfs: fix heavy delalloc related deadlock
I added a patch where we started taking the ordered operations mutex when we
waited on ordered extents.  We need this because we splice the list and process
it, so if a flusher came in during this scenario it would think the list was
empty and we'd usually get an early ENOSPC.  The problem with this is that this
lock is used in transaction committing.  So we end up with something like this

Transaction commit
	-> wait on writers

Delalloc flusher
	-> run_ordered_operations (holds mutex)
		->wait for filemap-flush to do its thing

flush task
	-> cow_file_range
		->wait on btrfs_join_transaction because we're commiting

some other task
	-> commit_transaction because we notice trans->transaction->flush is set
		-> run_ordered_operations (hang on mutex)

We need to disentangle the ordered operations flushing from the delalloc
flushing, since they are separate things.  This solves the deadlock issue I was
seeing.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:05:04 -04:00
Josef Bacik
4ef31a45a0 Btrfs: fix the error handling wrt orphan items
There are several places where we BUG_ON() if we fail to remove the orphan items
and such, which is not ok, so remove those and either abort or just carry on.
This also fixes a problem where if we couldn't start a transaction we wouldn't
actually remove the orphan item reserve for the inode.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:05:03 -04:00
Josef Bacik
175a2b871f Btrfs: don't allow a subvol to be deleted if it is the default subovl
Eric pointed out that btrfs will happily allow you to delete the default subvol.
This is a problem obviously since the next time you go to mount the file system
it will freak out because it can't find the root.  Fix this by adding a check to
see if our default subvol points to the subvol we are trying to delete, and if
it does not allowing it to happen.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:05:02 -04:00
Josef Bacik
a05254143c Btrfs: skip subvol entries when checking if we've created a dir already
We have logic to see if we've already created a parent directory by check to see
if an inode inside of that directory has a lower inode number than the one we
are currently processing.  The logic is that if there is a lower inode number
then we would have had to made sure the directory was created at that previous
point.  The problem is that subvols inode numbers count from the lowest objectid
in the root tree, which may be less than our current progress.  So just skip if
our dir item key is a root item.  This fixes the original test and the xfstest
version I made that added an extra subvol create.  Thanks,

Reported-by: Emil Karlson <jekarlson@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:05:01 -04:00
Mark Fasheh
416161db9b btrfs: offline dedupe
This patch adds an ioctl, BTRFS_IOC_FILE_EXTENT_SAME which will try to
de-duplicate a list of extents across a range of files.

Internally, the ioctl re-uses code from the clone ioctl. This avoids
rewriting a large chunk of extent handling code.

Userspace passes in an array of file, offset pairs along with a length
argument. The ioctl will then (for each dedupe) do a byte-by-byte comparison
of the user data before deduping the extent. Status and number of bytes
deduped are returned for each operation.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:05:00 -04:00
Mark Fasheh
4b384318a7 btrfs: Introduce extent_read_full_page_nolock()
We want this for btrfs_extent_same. Basically readpage and friends do their
own extent locking but for the purposes of dedupe, we want to have both
files locked down across a set of readpage operations (so that we can
compare data). Introduce this variant and a flag which can be set for
extent_read_full_page() to indicate that we are already locked.

Partial credit for this patch goes to Gabriel de Perthuis <g2p.code@gmail.com>
as I have included a fix from him to the original patch which avoids a
deadlock on compressed extents.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:59 -04:00
Mark Fasheh
32b7c687c5 btrfs_ioctl_clone: Move clone code into it's own function
There's some 250+ lines here that are easily encapsulated into their own
function. I don't change how anything works here, just create and document
the new btrfs_clone() function from btrfs_ioctl_clone() code.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:58 -04:00
Mark Fasheh
77fe20dc62 btrfs: abtract out range locking in clone ioctl()
The range locking in btrfs_ioctl_clone is trivially broken out into it's own
function. This reduces the complexity of btrfs_ioctl_clone() by a small bit
and makes that locking code available to future functions in
fs/btrfs/ioctl.c

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:57 -04:00
Wang Shilong
a4fdb61e81 Btrfs: fix possible memory leak in find_parent_nodes()
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:56 -04:00
Filipe David Borba Manana
09fb99a696 Btrfs: return ENOSPC when target space is full
In extent-tree.c:do_chunk_alloc(), early on we returned 0 (success)
when the target space was full and when chunk allocation is needed.
However, later on in that same function we return ENOSPC if
btrfs_alloc_chunk() fails (and chunk allocation was needed) and
set the space's full flag.

This was inconsistent, as -ENOSPC should be returned if the space
is full and a chunk allocation needs to performed. If the space is
full but no chunk allocation is needed, just return 0 (success).

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:55 -04:00
Filipe David Borba Manana
ada9af215c Btrfs: don't ignore errors from btrfs_run_delayed_items
tree-log.c was ignoring the return value from btrfs_run_delayed_items()
in several places.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:54 -04:00
Filipe David Borba Manana
2bac325ea8 Btrfs: fix inode leak on kmalloc failure in tree-log.c
In tree-log.c:replay_one_name(), if memory allocation for
the name fails, ensure we iput the dir inode we got before
before we return.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:53 -04:00
Liu Bo
116e0024c4 Btrfs: allow compressed extents to be merged during defragment
The rule originally comes from nocow writing, but snapshot-aware
defrag is a different case, the extent has been writen and we're
not going to change the extent but add a reference on the data.

So we're able to allow such compressed extents to be merged into
one bigger extent if they're pointing to the same data.

Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:52 -04:00
David Sterba
8b87dc17fb btrfs: add mount option to set commit interval
I'ts hardcoded to 30 seconds which is fine for most users. Higher values
defer data being synced to permanent storage with obvious consequences
when the system crashes. The upper bound is not forced, but a warning is
printed if it's more than 300 seconds (5 minutes).

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:51 -04:00
Josef Bacik
9ec7267751 Btrfs: stop using GFP_ATOMIC when allocating rewind ebs
There is no reason we can't just set the path to blocking and then do normal
GFP_NOFS allocations for these extent buffers.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:50 -04:00
Josef Bacik
db7f3436c1 Btrfs: deal with enomem in the rewind path
We can get ENOMEM trying to allocate dummy bufs for the rewind operation of the
tree mod log.  Instead of BUG_ON()'ing in this case pass up ENOMEM.  I looked
back through the callers and I'm pretty sure I got everybody who did BUG_ON(ret)
in this path.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:49 -04:00
Josef Bacik
ebdad913aa Btrfs: check our parent dir when doing a compare send
When doing a send with a parent subvol we will check to see if the file we are
acting on is being overwritten and move it if we think it may be needed further
down the line during the send.  We check this by checking its directory and
making sure it existed in the parent and making sure the file existed in the
parent.  The problem with this check is that if we create a directory and a file
in that directory, and then snapshot, and then remove and re-create that same
directory and file with different inode numbers and then try to snapshot and
send with the original parent we will try and save the original file inside of
that directory.  This is a problem because during the receive we move the
directory out of the way because it is a completely new inode, which makes us
unable to find the old file inside of the directory when we try to move that out
of the way for the overwrite.  We fix this by checking the parent directory of
the inode we think we are overwriting.  If the parent directory generation in
the send root != the parent directory generation in the parent root then we know
it is a completely new directory and we need not bother with moving the file out
of the way because it would have been completely destroyed.  This fixes bz
60673.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:48 -04:00
Josef Bacik
36cce92287 Btrfs: handle errors when doing slow caching
Alex Lyakas reported a bug where wait_block_group_cache_progress() would wait
forever if a drive failed.  This is because we just bail out if there is an
error while trying to cache a block group, we don't update anybody who may be
waiting.  So this introduces a new enum for the cache state in case of error and
makes everybody bail out if we have an error.  Alex tested and verified this
patch fixed his problem.  This fixes bz 59431.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:47 -04:00
Filipe David Borba Manana
0f0fe8f710 Btrfs: add missing error handling to read_tree_block
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:46 -04:00
Dave Jones
eb2067f713 Fix leak in __btrfs_map_block error path
If we bail out when the stripe alloc fails, we need to undo the
earlier allocation of raid_map.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:45 -04:00
Filipe David Borba Manana
f5929cd814 Btrfs: add missing error check to find_parent_nodes
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:44 -04:00
Filipe David Borba Manana
395927a9d8 Btrfs: optimize function btrfs_read_chunk_tree
After reading all device items from the chunk tree, don't
exit the loop and then navigate down the tree again to find
the chunk items. Instead just read all device items and
chunk items with a single tree search. This is possible
because all device items are found before any chunk item in
the chunks tree.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:43 -04:00
Josef Bacik
6596a92819 Btrfs: don't bug_on when we fail when cleaning up transactions
There is no reason for this sort of jackassery.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:42 -04:00
Josef Bacik
b6c60c8018 Btrfs: change how we queue blocks for backref checking
Previously we only added blocks to the list to have their backrefs checked if
the level of the block is right above the one we are searching for.  This is
because we want to make sure we don't add the entire path up to the root to the
lists to make sure we process things one at a time.  This assumes that if any
blocks in the path to the root are going to be not checked (shared in other
words) then they will be in the level right above the current block on up.  This
isn't quite right though since we can have blocks higher up the list that are
shared because they are attached to a reloc root.  But we won't add this block
to be checked and then later on we will BUG_ON(!upper->checked).  So instead
keep track of wether or not we've queued a block to be checked in this current
search, and if we haven't go ahead and queue it to be checked.  This patch fixed
the panic I was seeing where we BUG_ON(!upper->checked).  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:41 -04:00
Josef Bacik
d062d13cf1 Btrfs: check to see if we have an inline item properly
If our item isn't big enough to have an actual inline item when we have skinny
metadata enabled just return 1 in find_inline_backref so we can move on to the
next item.  This probably wasn't causing a problem since we check the values of
ptr and end properly, but just in case this will keep us from doing extra work.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:40 -04:00
Josef Bacik
151a41bc46 Btrfs: fix what bits we clear when erroring out from delalloc
First of all we no longer set EXTENT_DIRTY when we dirty an extent so this patch
removes the clearing of EXTENT_DIRTY since it confuses me.  This patch also adds
clearing EXTENT_DEFRAG and also doing EXTENT_DO_ACCOUNTING when we have errors.
This is because if we are clearing delalloc without adding an ordered extent
then we need to make sure the enospc handling stuff is accounted for.  Also if
this range was DEFRAG we need to make sure that bit is cleared so we dont leak
it.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:39 -04:00
Josef Bacik
c2790a2e2b Btrfs: cleanup arguments to extent_clear_unlock_delalloc
This patch removes the io_tree argument for extent_clear_unlock_delalloc since
we always use &BTRFS_I(inode)->io_tree, and it separates out the extent tree
operations from the page operations.  This way we just pass in the extent bits
we want to clear and then pass in the operations we want done to the pages.
This is because I'm going to fix what extent bits we clear in some cases and
rather than add a bunch of new flags we'll just use the actual extent bits we
want to clear.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:38 -04:00
Anand Jain
8068a47e2a btrfs: use BTRFS_SUPER_INFO_SIZE macro at btrfs_read_dev_super()
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:37 -04:00
Miao Xie
125bac016d Btrfs: cache the extent map struct when reading several pages
When we read several pages at once, we needn't get the extent map object
every time we deal with a page, and we can cache the extent map object.
So, we can reduce the search time of the extent map, and besides that, we
also can reduce the lock contention of the extent map tree.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:36 -04:00
Miao Xie
9974090bdd Btrfs: batch the extent state operation when reading pages
In the past, we cached the checksum value in the extent state object, so we
had to split the extent state object by the block size, or we had no space
to keep this checksum value. But it increased the lock contention of the
extent state tree.

Now we removed this limit by caching the checksum into the bio object, so
it is unnecessary to do the extent state operations by the block size, we
can do it in batches, in this way, we can reduce the lock contention of
the extent state tree.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:35 -04:00
Miao Xie
883d0de485 Btrfs: batch the extent state operation in the end io handle of the read page
Before applying this patch, we set the uptodate flag and unlock the extent
by the page size, it is unnecessary, we can do it in batches, it can reduce
the lock contention of the extent state tree.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:34 -04:00
Miao Xie
facc8a2247 Btrfs: don't cache the csum value into the extent state tree
Before applying this patch, we cached the csum value into the extent state
tree when reading some data from the disk, this operation increased the lock
contention of the state tree.

Now, we just store the csum value into the bio structure or other unshared
structure, so we can reduce the lock contention.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:33 -04:00
Miao Xie
f2a09da9d0 Btrfs: add branch prediction hints in the read page end IO function
This patch add some branch prediction hints into the end IO function
of the read page, it reduced the percentage of the branch misses from
5.5% to 4.9%.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:32 -04:00
Miao Xie
09a7f7a289 Btrfs: remove unnecessary argument of bio_readpage_error()
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:31 -04:00
Wang Shilong
8507d216a4 Btrfs: add missing mounting options in btrfs_show_options()
Some options are missing in btrfs_show_options(), this patch
adds them.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:30 -04:00
Wang Shilong
1493381f2f Btrfs: use u64 for subvolid when parsing mount options
Although for most time, int is enough for subvolid, we should
ensure safety in theory.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:29 -04:00
Wang Shilong
2c334e87f3 Btrfs: add sanity checks regarding to parsing mount options
I just notice the following commands succeed:
	mount <dev> <mnt> -o thread_pool=-1

This is ridiculous, only positive thread_pool makes sense,this
patch adds sanity checks for them, and also catches the error of
ENOMEM if allocating memory fails.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:28 -04:00
Miao Xie
3cd846d1d7 Btrfs, raid56: fix memory leak when allocating pages for p/q stripes failed
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:27 -04:00
Dan Carpenter
3dc0e818af btrfs/raid56: fix and cleanup some error paths
The alloc_rbio() frees "raid_map" and "bbio" on error, so there is a
potential double free bug in raid56_parity_write().  The
raid56_parity_write() and raid56_parity_recover() functions should still
free "raid_map" and "bbio" on error if other errors occur though, so I
have added some more calls to kfree().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:26 -04:00
Josef Bacik
2112ac800d Btrfs: don't bother autodefragging if our root is going away
We can end up with inodes on the auto defrag list that exist on roots that are
going to be deleted.  This is extra work we don't need to do, so just bail if
our root has 0 root refs.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:25 -04:00
Josef Bacik
b37b39cd6b Btrfs: cleanup reloc roots properly on error
I was hitting the BUG_ON() at the end of merge_reloc_roots() because we were
aborting the transaction at some point previously and then getting an error when
we tried to drop the reloc root.  I fixed btrfs_drop_snapshot to re-add us to
the dead roots list if we failed, but this isn't the right thing to do for reloc
roots since it uses root->root_list for it's own stuff in order to know what
needs to be cleaned up.  So fix btrfs_drop_snapshot to only do the re-add if we
aren't dropping for reloc, and handle errors from merge_reloc_root() by dropping
the reloc root we are processing since it won't be on the list of roots to
cleanup.  With this patch my reproducer no longer panics.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:24 -04:00
Josef Bacik
50f1319cb5 Btrfs: reset ret in record_one_backref
I was getting warnings when running find ./ -type f -exec btrfs fi defrag -f {}
\; from record_one_backref because ret was set.  Turns out it was because it was
set to 1 because the search slot didn't come out exact and we never reset it.
So reset it to 0 right after the search so we don't leak this and get
uneccessary warnings.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:23 -04:00
Anand Jain
a1b83ac52d btrfs: fix get set label blocking against balance
btrfs_ioctl_get_fslabel() and btrfs_ioctl_set_fslabel()
used root->fs_info->volume_mutex mutex which caused operations
like balance to block set/get label operation until its
completion and generally balance operation takes a long
time to complete, so it will be annoying to the user when
cli appears hung

also this patch will add a bit of optimization within
the btrfs_ioctl_get_falabel() function.

v1->v2:
   use fs_info->super_lock instead of uuid_mutex

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:15 -04:00
Stefan Behrens
d4c34f6bff Btrfs: Print key type in decimal everywhere
This is confusing, sometimes the key type is printed in hex (without
a leading "0x" which makes things even more complicated), sometimes
in decimal...
Change it to be in decimal everywhere.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:40 -04:00
Liu Bo
599c75ec3f Btrfs/tracepoint: update delayed ref tracepoints
This shows exactly how btrfs processes the delayed refs onto disks,
which is very helpful on understanding delayed ref mechanism and
debugging related bugs.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:39 -04:00
chandan
1095cc0d92 btrfs_read_block_groups: Use enums to index
btrfs_space_info->block_groups.

The current code uses integer literals to index
btrfs_space_info->block_groups[] array. Instead use corresponding
enums from 'enum btrfs_raid_types'.

Signed-off-by: chandan <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:38 -04:00
Qu Wenruo
3cae210fa5 btrfs: Cleanup for using BTRFS_SETGET_STACK instead of raw convert
Some codes still use the cpu_to_lexx instead of the
BTRFS_SETGET_STACK_FUNCS declared in ctree.h.

Also added some BTRFS_SETGET_STACK_FUNCS for btrfs_header btrfs_timespec
and other structures.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaoxie@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:37 -04:00
Wang Shilong
1e7bac1ef7 Btrfs: set qgroup_ulist to be null after calling ulist_free()
We call ulist_free(qgroup_ulist) in btrfs_free_qgroup_config(),
and btrfs_free_qgroup_config() may be called in two cases:

(1)umount filesystem
(2)disabling quota

However, if we firstly disable quota and then umount filesystem,
a double free happens. Fix it.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:36 -04:00
Filipe David Borba Manana
647f63bd36 Btrfs: add missing error checks to add_data_references
The function relocation.c:add_data_references() was not checking
if all calls to __add_tree_block() and find_data_references() were
succeeding or not.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:35 -04:00
David Sterba
ccf39f92f3 btrfs: make errors in btrfs_num_copies less noisy
The log message level 'critical' is verbose enough, 'emergency' beeps on
all terminals.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:34 -04:00
Liu Bo
52ee28d249 Btrfs: make free space caching faster with many non-inline extent references
So to cache free space, we iterate every extent item to gather free space info.

When we have say 10,000 non-inline extent refs(such as BTRFS_EXTENT_DATA_REF),
it takes quite a long time, and since inline extent refs and non-inline ones have
same objectid in their keys, we can just re-search the tree with the next address
to skip non-inline references.

(This is found by dedup feature because dedup extents can end up with many
non-inline extent refs.)

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:24 -04:00
Jeff Mahoney
ee3441b490 btrfs: fall back to global reservation when removing subvolumes
I recently did some ENOSPC testing that involved filling the disk
while create and removing snapshots in a loop. During the test cycle,
I ran into an ENOSPC when trying to remove a snapshot, leaving the fs
stuck in ENOSPC even after a umount/mount cycle.

This patch allow subvolume removal to fall back onto the global
block reservation in order to succeed when it would have failed
otherwise.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:23 -04:00
Filipe David Borba Manana
74be951087 Btrfs: optimize btrfs_lookup_extent_info()
If we're looking for a metadata item in the tree and the
search fails with return value of 1, and the slot doesn't
point to the first item in the leaf, check if the previous
item in the leaf corresponds to an extent item for the same
object id - if it does, then don't do another tree search
to get it.

This optimization is already done by btrfs-progs.

V2: updated commit message.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:22 -04:00
Carey Underwood
d790155457 Btrfs: Release uuid_mutex for shrink during device delete
Device scanning waits on the uuid_mutex, which can result in a very long
wait if dev delete is shrinking the device.

Signed-off-by: Carey Underwood <cwillu@cwillu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:21 -04:00
Josef Bacik
b2aaaa3b8c Btrfs: set lockdep class before locking new extent buffer
We've been seeing spurious complaints out of lockdep because the lock class name
changes.  This is happening because when we drop a snapshot we will lock a block
before we've read it in, which sets the lockdep class to whatever the default
is.  Then once we read the thing in we reset the lockdep class to what it is
supposed to be, which blows lockdeps' mind.  This patch should fix the problem,
it appears to be the only place where we do this sort of thing.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:20 -04:00
Stefan Agner
59516f6017 Btrfs: return -1 when lzo compression makes data bigger
With this fix the lzo code behaves like the zlib code by returning an
error
code when compression does not help reduce the size of the file.
This is currently not a bug since the compressed size is checked again
in
the calling method compress_file_range.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:19 -04:00
Josef Bacik
c8cc634165 Btrfs: stop using GFP_ATOMIC for the tree mod log allocations
Previously we held the tree mod lock when adding stuff because we use it to
check and see if we truly do want to track tree modifications.  This is
admirable, but GFP_ATOMIC in a critical area that is going to get hit pretty
hard and often is not nice.  So instead do our basic checks to see if we don't
need to track modifications, and if those pass then do our allocation, and then
when we go to insert the new modification check if we still care, and if we
don't just free up our mod and return.  Otherwise we're good to go and we can
carry on.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 07:57:17 -04:00
Linus Torvalds
c95389b4cd Merge branch 'akpm' (patches from Andrew Morton)
Merge fixes from Andrew Morton:
 "Five fixes.

  err, make that six.  let me try again"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers
  memcg: check that kmem_cache has memcg_params before accessing it
  drivers/base/memory.c: fix show_mem_removable() to handle missing sections
  IPC: bugfix for msgrcv with msgtyp < 0
  Omnikey Cardman 4000: pull in ioctl.h in user header
  timer_list: correct the iterator for timer_list
2013-08-28 19:31:33 -07:00
Goldwyn Rodrigues
49fa8140e4 fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers
While using pacemaker/corosync, the node numbers are generated using IP
address as opposed to serial node number generation.  This may not fit
in a 8-byte string.  Use a bigger string to print the complete node
number.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-28 19:26:38 -07:00
Waiman Long
98474236f7 vfs: make the dentry cache use the lockref infrastructure
This just replaces the dentry count/lock combination with the lockref
structure that contains both a count and a spinlock, and does the
mechanical conversion to use the lockref infrastructure.

There are no semantic changes here, it's purely syntactic.  The
reference lockref implementation uses the spinlock exactly the same way
that the old dcache code did, and the bulk of this patch is just
expanding the internal "d_count" use in the dcache code to use
"d_lockref.count" instead.

This is purely preparation for the real change to make the reference
count updates be lockless during the 3.12 merge window.

[ As with the previous commit, this is a rewritten version of a concept
  originally from Waiman, so credit goes to him, blame for any errors
  goes to me.

  Waiman's patch had some semantic differences for taking advantage of
  the lockless update in dget_parent(), while this patch is
  intentionally a pure search-and-replace change with no semantic
  changes.     - Linus ]

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-28 18:24:59 -07:00
Linus Torvalds
f0cc6ffb8c Revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink"
This reverts commit bb2314b479.

It wasn't necessarily wrong per se, but we're still busily discussing
the exact details of this all, so I'm going to revert it for now.

It's true that you can already do flink() through /proc and that flink()
isn't new.  But as Brad Spengler points out, some secure environments do
not mount proc, and flink adds a new interface that can avoid path
lookup of the source for those kinds of environments.

We may re-do this (and even mark it for stable backporting back in 3.11
and possibly earlier) once the whole discussion about the interface is done.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-28 09:18:05 -07:00
Linus Torvalds
83c425d222 One JFS patch to fix an incompatibility with NFSv4 resulting in the nfs
client reporting a readdir loop.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.21 (GNU/Linux)
 
 iQIcBAABAgAGBQJSG9IHAAoJEDaohF61QIxkptgP/jTEooTZ2uMmIouqj6rhrc81
 avrqtB26Ww74XBzuyiTuSBLUUJXGaLveC/i9rR2XOyA7cXJUbrqvqpZ2OExOURsf
 gHeLX20RKwKgaRQ6R9Xcri6wjWql4YHL/z3tI/DGpDkMfA0siocbHW+GZSfmMZ8c
 EOcAECY+tqrAgFL1oESTinglGNZ2V1f/IyKB8DiUDgDuMkV6Zw3083Ph7xB/bw9T
 Jffg6nvkST2mzZNTKgU8W3extW/X9GxxleYLED2jwEYioOFVAlvSKZTFD65NVUya
 1l3YH68jROnDSoHmgOup0C6i51/e7QFuBNdyR/8UK2OyOXkJ1wneXutUIiysWWty
 dY9+kxSSdpNuZvflGrZlP7yoEjGgRO/893owUcusiSgLTpQpNs2y7OVjCBwsHGa7
 AIWyu+FyOnNnmO6oiYkmNQlE3bAoz3z0CO7IuP5lm5HRgMIxp+k2k8yOHon/PcuF
 juQsbMydGcNVAjJlQuxCDh1uijOGLbDol/NekpUnyL02oy294raiWdy4IUaqWIHG
 Y5i1yeVwFbr7QsI8RCbEliCRwP5YMWSp4irEUozJTDplAn4AnJ5AJdYq9KM5JHMm
 qBHXl/asWbxGQZhmig2xzojZ+HVPFVWilpBz7OvsMXJaulrRqxV3DgAudIlZPP4B
 mB8DPzctwmgzmlgCqzRW
 =+aYs
 -----END PGP SIGNATURE-----

Merge tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggy

Pull jfs fix from Dave Kleikamp:
 "One JFS patch to fix an incompatibility with NFSv4 resulting in the
  nfs client reporting a readdir loop"

* tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggy:
  jfs: fix readdir cookie incompatibility with NFSv4
2013-08-26 19:22:49 -07:00
Linus Torvalds
4d4323ea2d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Assorted fixes from the last week or so"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: collect_mounts() should return an ERR_PTR
  bfs: iget_locked() doesn't return an ERR_PTR
  efs: iget_locked() doesn't return an ERR_PTR()
  proc: kill the extra proc_readfd_common()->dir_emit_dots()
  cope with potentially long ->d_dname() output for shmem/hugetlb
2013-08-25 12:25:38 -07:00
Linus Torvalds
5befb98b30 SCSI fixes on 20130824
This is a set of small bug fixes for lpfc and zfcp and a fix for a fairly
 nasty bug in sg where a process which cancels I/O completes in a kernel thread
 which would then try to write back to the now gone userspace and end up
 writing to a random kernel address instead.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSGIaHAAoJEDeqqVYsXL0MbPUH/3UXceHlgYRrwYZ0C10Ao5XB
 WA8RWsDsX9UJxG68zEd8ED1aRHhmkfm4pEdMQ8DHW7+B7mvNhpb6mF0wxvmS5aIj
 OVI0G+3KmghA3aDWbTtg8ED0wJ4q3ftcyzl4Fhpat+yA4g/BW7iJNDCv17nvZ90f
 hNmdGm23wuYCid7JWNDO79spSp0q6wPJhG6ynJYOtzX1GvpEliZiGB0IOR3K44nW
 cF6+Uigs3+6RGXX9UHOMrk9Ug3YFHfok224vvydcbRXVh8uneB8RQ6tziJVFI8tE
 WPgAv2oDzly8l+ku71CqrjzG7fSwCr9Urlog9cEugE1iUmFCIQm6xJcSSnGJqaY=
 =jKT6
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of small bug fixes for lpfc and zfcp and a fix for a
  fairly nasty bug in sg where a process which cancels I/O completes in
  a kernel thread which would then try to write back to the now gone
  userspace and end up writing to a random kernel address instead"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] zfcp: remove access control tables interface (keep sysfs files)
  [SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
  [SCSI] zfcp: fix lock imbalance by reworking request queue locking
  [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
  [SCSI] lpfc: Don't force CONFIG_GENERIC_CSUM on
2013-08-24 11:33:21 -07:00
Dan Carpenter
52e220d357 VFS: collect_mounts() should return an ERR_PTR
This should actually be returning an ERR_PTR on error instead of NULL.
That was how it was designed and all the callers expect it.

[AV: actually, that's what "VFS: Make clone_mnt()/copy_tree()/collect_mounts()
return errors" missed - originally collect_mounts() was expected to return
NULL on failure]

Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-24 12:10:29 -04:00
Dan Carpenter
821ff77c6c bfs: iget_locked() doesn't return an ERR_PTR
iget_locked() returns a NULL on error, it doesn't return an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-24 12:10:22 -04:00
Dan Carpenter
136eefa48d efs: iget_locked() doesn't return an ERR_PTR()
The iget_locked() function returns NULL on error and never an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-24 12:10:22 -04:00
Oleg Nesterov
a5a1955e0c proc: kill the extra proc_readfd_common()->dir_emit_dots()
proc_readfd_common() does dir_emit_dots() twice in a row,
we need to do this only once.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-24 12:10:22 -04:00
Al Viro
118b230225 cope with potentially long ->d_dname() output for shmem/hugetlb
dynamic_dname() is both too much and too little for those - the
output may be well in excess of 64 bytes dynamic_dname() assumes
to be enough (thanks to ashmem feeding really long names to
shmem_file_setup()) and vsnprintf() is an overkill for those
guys.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-08-24 12:10:17 -04:00
Vyacheslav Dubeyko
4bf93b50fd nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection
Fix the issue with improper counting number of flying bio requests for
BIO_EOPNOTSUPP error detection case.

The sb_nbio must be incremented exactly the same number of times as
complete() function was called (or will be called) because
nilfs_segbuf_wait() will call wail_for_completion() for the number of
times set to sb_nbio:

  do {
      wait_for_completion(&segbuf->sb_bio_event);
  } while (--segbuf->sb_nbio > 0);

Two functions complete() and wait_for_completion() must be called the
same number of times for the same sb_bio_event.  Otherwise,
wait_for_completion() will hang or leak.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-23 09:51:22 -07:00
Vyacheslav Dubeyko
2df37a19c6 nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error
Remove double call of bio_put() in nilfs_end_bio_write() for the case of
BIO_EOPNOTSUPP error detection.  The issue was found by Dan Carpenter
and he suggests first version of the fix too.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-23 09:51:22 -07:00
Roland Dreier
35dc248383 [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

 - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
   underlying SCSI command will transfer data from the SCSI device to
   the buffer provided in the ioctl)

 - Before the command finishes, a signal is sent to the process waiting
   in the ioctl.  This will end up waking up the sg_ioctl() code:

		result = wait_event_interruptible(sfp->read_wait,
			(srp_done(sfp, srp) || sdp->detached));

   but neither srp_done() nor sdp->detached is true, so we end up just
   setting srp->orphan and returning to userspace:

		srp->orphan = 1;
		write_unlock_irq(&sfp->rq_list_lock);
		return result;	/* -ERESTARTSYS because signal hit process */

   At this point the original process is done with the ioctl and
   blithely goes ahead handling the signal, reissuing the ioctl, etc.

 - Eventually, the SCSI command issued by the first ioctl finishes and
   ends up in sg_rq_end_io().  At the end of that function, we run through:

	write_lock_irqsave(&sfp->rq_list_lock, iflags);
	if (unlikely(srp->orphan)) {
		if (sfp->keep_orphan)
			srp->sg_io_owned = 0;
		else
			done = 0;
	}
	srp->done = done;
	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);

	if (likely(done)) {
		/* Now wake up any sg_read() that is waiting for this
		 * packet.
		 */
		wake_up_interruptible(&sfp->read_wait);
		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
		kref_put(&sfp->f_ref, sg_remove_sfp);
	} else {
		INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
		schedule_work(&srp->ew.work);
	}

   Since srp->orphan *is* set, we set done to 0 (assuming the
   userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
   ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
   to run in a workqueue.

 - In workqueue context we go through sg_rq_end_io_usercontext() ->
   sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
   bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

   The key point here is that we are doing copy_to_user() on a
   workqueue -- that is, we're on a kernel thread with current->mm
   equal to whatever random previous user process was scheduled before
   this kernel thread.  So we end up copying whatever data the SCSI
   command returned to the virtual address of the buffer passed into
   the original ioctl, but it's quite likely we do this copying into a
   different address space!

As suggested by James Bottomley <James.Bottomley@hansenpartnership.com>,
add a check for current->mm (which is NULL if we're on a kernel thread
without a real userspace address space) in bio_uncopy_user(), and skip
the copy if we're on a kernel thread.

There's no reason that I can think of for any caller of bio_uncopy_user()
to want to do copying on a kernel thread with a random active userspace
address space.

Huge thanks to Costa Sapuntzakis <costa@purestorage.com> for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Tested-by: David Milburn <dmilburn@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-21 10:58:35 -07:00
Linus Torvalds
fd3930f70c proc: more readdir conversion bug-fixes
In the previous commit, Richard Genoud fixed proc_root_readdir(), which
had lost the check for whether all of the non-process /proc entries had
been returned or not.

But that in turn exposed _another_ bug, namely that the original readdir
conversion patch had yet another problem: it had lost the return value
of proc_readdir_de(), so now checking whether it had completed
successfully or not didn't actually work right anyway.

This reinstates the non-zero return for the "end of base entries" that
had also gotten lost in commit f0c3b5093a ("[readdir] convert
procfs").  So now you get all the base entries *and* you get all the
process entries, regardless of getdents buffer size.

(Side note: the Linux "getdents" manual page actually has a nice example
application for testing getdents, which can be easily modified to use
different buffers.  Who knew? Man-pages can be useful)

Reported-by: Emmanuel Benisty <benisty.e@gmail.com>
Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Cc: Richard Genoud <richard.genoud@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-19 16:26:12 -07:00
Richard Genoud
94fc5d9de5 proc: return on proc_readdir error
Commit f0c3b5093a ("[readdir] convert procfs") introduced a bug on the
listing of the proc file-system.  The return value of proc_readdir()
isn't tested anymore in the proc_root_readdir function.

This lead to an "interesting" behaviour when we are using the getdents()
system call with a buffer too small: instead of failing, it returns the
first entries of /proc (enough to fill the given buffer), plus the PID
directories.

This is not triggered on glibc (as getdents is called with a 32KB
buffer), but on uclibc, the buffer size is only 1KB, thus some proc
entries are missing.

See https://lkml.org/lkml/2013/8/12/288 for more background.

Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-19 09:47:27 -07:00
Linus Torvalds
d6a5e06cd1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull gfs2 fixes from Steven Whitehouse:
 "Out of these five patches, the one for ensuring that the number of
  revokes is not exceeded, and the one for checking the glock is not
  already held in gfs2_getxattr are the two most important.  The latter
  can be triggered by selinux.

  The other three patches are very small and fix mostly fairly trivial
  issues"

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Check for glock already held in gfs2_getxattr
  GFS2: alloc_workqueue() doesn't return an ERR_PTR
  GFS2: don't overrun reserved revokes
  GFS2: WQ_NON_REENTRANT is meaningless and going away
  GFS2: Fix typo in gfs2_create_inode()
2013-08-19 09:30:12 -07:00
Steven Whitehouse
7bd9ee58a4 GFS2: Check for glock already held in gfs2_getxattr
Since the introduction of atomic_open, gfs2_getxattr can be
called with the glock already held, so we need to allow for
this.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: David Teigland <teigland@redhat.com>
Tested-by: David Teigland <teigland@redhat.com>
2013-08-19 09:33:57 +01:00
Dan Carpenter
dfc4616dde GFS2: alloc_workqueue() doesn't return an ERR_PTR
alloc_workqueue() returns a NULL on error, it doesn't return an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-08-19 09:33:43 +01:00
Benjamin Marzinski
1bc333f4cf GFS2: don't overrun reserved revokes
When run during fsync, a gfs2_log_flush could happen between the
time when gfs2_ail_flush checked the number of blocks to revoke,
and when it actually started the transaction to do those revokes.
This occassionally caused it to need more revokes than it reserved,
causing gfs2 to crash.

Instead of just reserving enough revokes to handle the blocks that
currently need them, this patch makes gfs2_ail_flush reserve the
maximum number of revokes it can, without increasing the total number
of reserved log blocks. This patch also passes the number of reserved
revokes to __gfs2_ail_flush() so that it doesn't go over its limit
and cause a crash like we're seeing. Non-fsync calls to __gfs2_ail_flush
will still cause a BUG() necessary revokes are skipped.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-08-19 09:33:16 +01:00
Tejun Heo
d08fa65a81 GFS2: WQ_NON_REENTRANT is meaningless and going away
dbf2576e37 ("workqueue: make all workqueues non-reentrant") made
WQ_NON_REENTRANT no-op and the flag is going away.  Remove its usages.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
2013-08-19 09:33:01 +01:00
Steven Whitehouse
2523d47a79 GFS2: Fix typo in gfs2_create_inode()
PTR_RET should be PTR_ERR

Reported-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-08-19 09:32:29 +01:00
Linus Torvalds
a08797e853 Two jbd2 bug fixes, one of which is a regression fix.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJSD2uJAAoJENNvdpvBGATwPwEP/R1Yu0UReTDVSmRwn8MGbk5O
 M7QwPgUVnEdqr86v4qx7kqOmKtwSQKfSr7+07UGwEBVasPPI1ezXT5c2VAwz146z
 5EClKIJoZoKIdWwM0NvUYxb7p5HgTLubho5RTsHmjOGIy37ju+5rFKZ44aHt9siY
 J3w5v1EruHdDXwjyiZcIeHFUBGV34x9zPtIGy3yY06gEPAUMeEXShiH/Uj3EroKH
 50PkwECBvqFuO4HpunhSYfNc3csTir2ZAx4judaV0s5ebyxzyZ398ql581q73mvP
 m5KREYYVLQnN9VzCnsjLsnyZboh6XoTxmlmblNVOr0LihyKLS/Nof6pvrWEqgnBf
 6NpR6Y8cPVUq+rhhXwFx/RIkVh2EIME0T5dZplr7fVaPxzI1ZKud8p18WifxMAix
 y3rbdZiEA//OuH5u5XqV0/zv3aJUP7XjdT8dia5iq3FazFOdyT1VfVXzAfaXyqZk
 JlFguSrOJbQe+vgk2gFLWLH8L4wsnGlSUv1AeDEboriushmZqYlAn10y6ENnM5yW
 IMFNA6Wyz35MghsKMKV57H7B05VS8em6NGtHZVhTBt7JoiFn7PBvic/KcK/czpDy
 W2Y8K8azhuscmQfa/RjuVDtwakbNwTu341e7hIUOy41qL/x9nNr07ZRAeF68Nesd
 p4uRmJxLSW9PReYp9n4V
 =OroQ
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull jbd2 bug fixes from Ted Ts'o:
 "Two jbd2 bug fixes, one of which is a regression fix"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: Fix oops in jbd2_journal_file_inode()
  jbd2: Fix use after free after error in jbd2_journal_dirty_metadata()
2013-08-17 10:43:19 -07:00