linux

Author	SHA1	Message	Date
Michael Halcrow	e77a56ddce	[PATCH] eCryptfs: Encrypted passthrough Provide an option to provide a view of the encrypted files such that the metadata is always in the header of the files, regardless of whether the metadata is actually in the header or in the extended attribute. This mode of operation is useful for applications like incremental backup utilities that do not preserve the extended attributes when directly accessing the lower files. With this option enabled, the files under the eCryptfs mount point will be read-only. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Michael Halcrow	dd2a3b7ad9	[PATCH] eCryptfs: Generalize metadata read/write Generalize the metadata reading and writing mechanisms, with two targets for now: metadata in file header and metadata in the user.ecryptfs xattr of the lower file. [akpm@osdl.org: printk warning fix] [bunk@stusta.de: make some needlessly global code static] Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Michael Halcrow	17398957aa	[PATCH] eCryptfs: xattr flags and mount options This patch set introduces the ability to store cryptographic metadata into an lower file extended attribute rather than the lower file header region. This patch set implements two new mount options: ecryptfs_xattr_metadata - When set, newly created files will have their cryptographic metadata stored in the extended attribute region of the file rather than the header. When storing the data in the file header, there is a minimum of 8KB reserved for the header information for each file, making each file at least 12KB in size. This can take up a lot of extra disk space if the user creates a lot of small files. By storing the data in the extended attribute, each file will only occupy at least of 4KB of space. As the eCryptfs metadata set becomes larger with new features such as multi-key associations, most popular filesystems will not be able to store all of the information in the xattr region in some cases due to space constraints. However, the majority of users will only ever associate one key per file, so most users will be okay with storing their data in the xattr region. This option should be used with caution. I want to emphasize that the xattr must be maintained under all circumstances, or the file will be rendered permanently unrecoverable. The last thing I want is for a user to forget to set an xattr flag in a backup utility, only to later discover that their backups are worthless. ecryptfs_encrypted_view - When set, this option causes eCryptfs to present applications a view of encrypted files as if the cryptographic metadata were stored in the file header, whether the metadata is actually stored in the header or in the extended attributes. No matter what eCryptfs winds up doing in the lower filesystem, I want to preserve a baseline format compatibility for the encrypted files. As of right now, the metadata may be in the file header or in an xattr. There is no reason why the metadata could not be put in a separate file in future versions. Without the compatibility mode, backup utilities would have to know to back up the metadata file along with the files. The semantics of eCryptfs have always been that the lower files are self-contained units of encrypted data, and the only additional information required to decrypt any given eCryptfs file is the key. That is what has always been emphasized about eCryptfs lower files, and that is what users expect. Providing the encrypted view option will provide a way to userspace applications wherein they can always get to the same old familiar eCryptfs encrypted files, regardless of what eCryptfs winds up doing with the metadata behind the scenes. This patch: Add extended attribute support to version bit vector, flags to indicate when xattr or encrypted view modes are enabled, and support for the new mount options. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Michael Halcrow	dddfa461fc	[PATCH] eCryptfs: Public key; packet management Public key support code. This reads and writes packets in the header that contain public key encrypted file keys. It calls the messaging code in the previous patch to send and receive encryption and decryption request packets from the userspace daemon. [akpm@osdl.org: cleab fix] Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Michael Halcrow	88b4a07e66	[PATCH] eCryptfs: Public key transport mechanism This is the transport code for public key functionality in eCryptfs. It manages encryption/decryption request queues with a transport mechanism. Currently, netlink is the only implemented transport. Each inode has a unique File Encryption Key (FEK). Under passphrase, a File Encryption Key Encryption Key (FEKEK) is generated from a salt/passphrase combo on mount. This FEKEK encrypts each FEK and writes it into the header of each file using the packet format specified in RFC 2440. This is all symmetric key encryption, so it can all be done via the kernel crypto API. These new patches introduce public key encryption of the FEK. There is no asymmetric key encryption support in the kernel crypto API, so eCryptfs pushes the FEK encryption and decryption out to a userspace daemon. After considering our requirements and determining the complexity of using various transport mechanisms, we settled on netlink for this communication. eCryptfs stores authentication tokens into the kernel keyring. These tokens correlate with individual keys. For passphrase mode of operation, the authentication token contains the symmetric FEKEK. For public key, the authentication token contains a PKI type and an opaque data blob managed by individual PKI modules in userspace. Each user who opens a file under an eCryptfs partition mounted in public key mode must be running a daemon. That daemon has the user's credentials and has access to all of the keys to which the user should have access. The daemon, when started, initializes the pluggable PKI modules available on the system and registers itself with the eCryptfs kernel module. Userspace utilities register public key authentication tokens into the user session keyring. These authentication tokens correlate key signatures with PKI modules and PKI blobs. The PKI blobs contain PKI-specific information necessary for the PKI module to carry out asymmetric key encryption and decryption. When the eCryptfs module parses the header of an existing file and finds a Tag 1 (Public Key) packet (see RFC 2440), it reads in the public key identifier (signature). The asymmetrically encrypted FEK is in the Tag 1 packet; eCryptfs puts together a decrypt request packet containing the signature and the encrypted FEK, then it passes it to the daemon registered for the current->euid via a netlink unicast to the PID of the daemon, which was registered at the time the daemon was started by the user. The daemon actually just makes calls to libecryptfs, which implements request packet parsing and manages PKI modules. libecryptfs grabs the public key authentication token for the given signature from the user session keyring. This auth tok tells libecryptfs which PKI module should receive the request. libecryptfs then makes a decrypt() call to the PKI module, and it passes along the PKI block from the auth tok. The PKI uses the blob to figure out how it should decrypt the data passed to it; it performs the decryption and passes the decrypted data back to libecryptfs. libecryptfs then puts together a reply packet with the decrypted FEK and passes that back to the eCryptfs module. The eCryptfs module manages these request callouts to userspace code via message context structs. The module maintains an array of message context structs and places the elements of the array on two lists: a free and an allocated list. When eCryptfs wants to make a request, it moves a msg ctx from the free list to the allocated list, sets its state to pending, and fires off the message to the user's registered daemon. When eCryptfs receives a netlink message (via the callback), it correlates the msg ctx struct in the alloc list with the data in the message itself. The msg->index contains the offset of the array of msg ctx structs. It verifies that the registered daemon PID is the same as the PID of the process that sent the message. It also validates a sequence number between the received packet and the msg ctx. Then, it copies the contents of the message (the reply packet) into the msg ctx struct, sets the state in the msg ctx to done, and wakes up the process that was sleeping while waiting for the reply. The sleeping process was whatever was performing the sys_open(). This process originally called ecryptfs_send_message(); it is now in ecryptfs_wait_for_response(). When it wakes up and sees that the msg ctx state was set to done, it returns a pointer to the message contents (the reply packet) and returns. If all went well, this packet contains the decrypted FEK, which is then copied into the crypt_stat struct, and life continues as normal. The case for creation of a new file is very similar, only instead of a decrypt request, eCryptfs sends out an encrypt request. > - We have a great clod of key mangement code in-kernel. Why is that > not suitable (or growable) for public key management? eCryptfs uses Howells' keyring to store persistent key data and PKI state information. It defers public key cryptographic transformations to userspace code. The userspace data manipulation request really is orthogonal to key management in and of itself. What eCryptfs basically needs is a secure way to communicate with a particular daemon for a particular task doing a syscall, based on the UID. Nothing running under another UID should be able to access that channel of communication. > - Is it appropriate that new infrastructure for public key > management be private to a particular fs? The messaging.c file contains a lot of code that, perhaps, could be extracted into a separate kernel service. In essence, this would be a sort of request/reply mechanism that would involve a userspace daemon. I am not aware of anything that does quite what eCryptfs does, so I was not aware of any existing tools to do just what we wanted. > What happens if one of these daemons exits without sending a quit > message? There is a stale uid<->pid association in the hash table for that user. When the user registers a new daemon, eCryptfs cleans up the old association and generates a new one. See ecryptfs_process_helo(). > - _why_ does it use netlink? Netlink provides the transport mechanism that would minimize the complexity of the implementation, given that we can have multiple daemons (one per user). I explored the possibility of using relayfs, but that would involve having to introduce control channels and a protocol for creating and tearing down channels for the daemons. We do not have to worry about any of that with netlink. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Adrian Bunk	b5d5dfbd59	[PATCH] include/linux/nfsd/const.h: remove NFS_SUPER_MAGIC NFS_SUPER_MAGIC is already defined in include/linux/magic.h Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Chuck Lever	27459f0940	[PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses Expand the rq_addr field to allow it to contain larger addresses. Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then everywhere the 'sockaddr_in' was referenced, we use instead an accessor function (svc_addr_in) which safely casts the _storage to _in. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:36 -08:00
Chuck Lever	ad06e4bd62	[PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing There are loads of places where the RPC server assumes that the rq_addr fields contains an IPv4 address. Top among these are error and debugging messages that display the server's IP address. Let's refactor the address printing into a separate function that's smart enough to figure out the difference between IPv4 and IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
Chuck Lever	482fb94e1b	[PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper Sometimes we need to create an RPC service but not register it with the local portmapper. NFSv4 delegation callback, for example. Change the svc_makesock() API to allow optionally creating temporary or permanent sockets, optionally registering with the local portmapper, and make it return the ephemeral port of the new socket. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:35 -08:00
Eric W. Biederman	41487c65bf	[PATCH] pid: replace do/while_each_task_pid with do/while_each_pid_task There isn't any real advantage to this change except that it allows the old functions to be removed. Which is easier on maintenance and puts the code in a more uniform style. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Eric W. Biederman	ab521dc0f8	[PATCH] tty: update the tty layer to work with struct pid Of kernel subsystems that work with pids the tty layer is probably the largest consumer. But it has the nice virtue that the assiation with a session only lasts until the session leader exits. Which means that no reference counting is required. So using struct pid winds up being a simple optimization to avoid hash table lookups. In the long term the use of pid_nr also ensures that when we have multiple pid spaces mixed everything will work correctly. Signed-off-by: Eric W. Biederman <eric@maxwell.lnxi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:32 -08:00
Andries Brouwer	939b00df03	[PATCH] Minix V3 support This morning I needed to read a Minix V3 filesystem, but unfortunately my 2.6.19 did not support that, and neither did the downloaded 2.6.20rc4. Fortunately, google told me that Daniel Aragones had already done the work, patch found at http://www.terra.es/personal2/danarag/ Unfortunaly, looking at the patch was painful to my eyes, so I polished it a bit before applying. The resulting kernel boots, and reads the filesystem it needed to read. Signed-off-by: Daniel Aragones <danarag@gmail.com> Signed-off-by: Andries Brouwer <aeb@cwi.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:31 -08:00
Eric Dumazet	163da958ba	[PATCH] FS: speed up rw_verify_area() oprofile hunting showed a stall in rw_verify_area(), because of triple indirection and potential cache misses. (file->f_path.dentry->d_inode->i_flock) By moving initialization of 'struct inode' pointer before the pos/count sanity tests, we allow the compiler and processor to perform two loads by anticipation, reducing stall, without prefetch() hints. Even x86 arch has enough registers to not use temporary variables and not increase text size. I validated this patch running a bench and studied oprofile changes, and absolute perf of the test program. Results of my epoll_pipe_bench (source available on request) on a Pentium-M 1.6 GHz machine Before : # ./epoll_pipe_bench -l 30 -t 20 Avg: 436089 evts/sec read_count=8843037 write_count=8843040 21.218390 samples per call (best value out of 10 runs) After : # ./epoll_pipe_bench -l 30 -t 20 Avg: 470980 evts/sec read_count=9549871 write_count=9549894 21.216694 samples per call (best value out of 10 runs) oprofile CPU_CLK_UNHALTED events gave a reduction from 5.3401 % to 2.5851 % for the rw_verify_area() function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:29 -08:00
Tomasz Kvarsin	3991d3bd15	[PATCH] warning fix: unsigned->signed While compiling my code with -Wconversion using gcc-trunk, I always get a bunch of warrning from headers, here is fix for them: __getblk is alawys called with unsigned argument, but it takes signed, the same story with __bread,__breadahead and so on. Signed-off-by: Tomasz Kvarsin Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:29 -08:00
Ahmed S. Darwish	79a81aef76	[PATCH] reiserfs: Use ARRAY_SIZE macro when appropriate Use ARRAY_SIZE macro already defined in kernel.h Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:29 -08:00
Nick Piggin	f9e4acf3be	[PATCH] inotify: read return val fix Fix for inotify read bug (bugzilla.kernel.org #6999) Problem Description: When reading from an inotify device with an insufficient sized buffer, read(2) will return 0 with no errno set. This is because of an logically incorrect action from the user program thus should return an more logical value. My suggestion is return -EINVAL as for bind(2). This patch is based on the proposal from Ryan <wolf0403@hotmail.com>, and feedback from John McCutchan <john@johnmccutchan.com>. Return -EINVAL if we have not passed in enough buffer space to read a single inotify event, rather than 0 which indicates that there is nothing to read. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: "John McCutchan" <john@johnmccutchan.com> Cc: Ryan <wolf0403@hotmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
Christoph Hellwig	d003fb70fd	[PATCH] remove sb->s_files and file_list_lock usage in dquot.c Iterate over sb->s_inodes instead of sb->s_files in add_dquot_ref. This reduces list search and lock hold time aswell as getting rid of one of the few uses of file_list_lock which Ingo identified as a scalability problem. Previously we called dq_op->initialize for every inode handing of a writeable file that wasn't initialized before. Now we're calling it for every inode that has a non-zero i_writecount, aka a writeable file descriptor refering to it. Thanks a lot to Jan Kara for running this patch through his quota test harness. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
Christoph Hellwig	fb58b7316a	[PATCH] move remove_dquot_ref to dqout.c Remove_dquot_ref can move to dqout.c instead of beeing in inode.c under #ifdef CONFIG_QUOTA. Also clean the resulting code up a tiny little bit by testing sb->dq_op earlier - it's constant over a filesystems lifetime. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:28 -08:00
Andreas Gruenbacher	eb3dfb0cb1	[PATCH] Fix d_path for lazy unmounts Here is a bugfix to d_path. First, when d_path() hits a lazily unmounted mount point, it tries to prepend the name of the lazily unmounted dentry to the path name. It gets this wrong, and also overwrites the slash that separates the name from the following pathname component. This is demonstrated by the attached test case, which prints "getcwd returned d_path-bugsubdir" with the bug. The correct result would be "getcwd returned d_path-bug/subdir". It could be argued that the name of the root dentry should not be part of the result of d_path in the first place. On the other hand, what the unconnected namespace was once reachable as may provide some useful hints to users, and so that seems okay. Second, it isn't always possible to tell from the __d_path result whether the specified root and rootmnt (i.e., the chroot) was reached: lazy unmounts of bind mounts will produce a path that does start with a non-slash so we can tell from that, but other lazy unmounts will produce a path that starts with a slash, just like "ordinary" paths. The attached patch cleans up __d_path() to fix the bug with overlapping pathname components. It also adds a @fail_deleted argument, which allows to get rid of some of the mess in sys_getcwd(). Grabbing the dcache_lock can then also be moved into __d_path(). The patch also makes sure that paths will only start with a slash for paths which are connected to the root and rootmnt. The @fail_deleted argument could be added to d_path() as well: this would allow callers to recognize deleted files, without having to resort to the ambiguous check for the " (deleted)" string at the end of the pathnames. This is not currently done, but it might be worthwhile. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Cc: Neil Brown <neilb@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
Robert P. J. Day	5c3bd438cc	[PATCH] NTFS: rename incorrect check of NTFS_DEBUG with just DEBUG Replace the incorrect debugging check of "#ifdef NTFS_DEBUG" with just "#ifdef DEBUG". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
Andrew Morton	215122e111	[PATCH] register_chrdev_region() don't hand out the LOCAL/EXPERIMENTAL majors As pointed out in http://bugzilla.kernel.org/show_bug.cgi?id=7922, dynamic chardev major allocation can hand out majors which LANANA has defined as being for local/experimental use. Cc: Torben Mathiasen <device@lanana.org> Cc: Greg KH <greg@kroah.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tomas Klas <tomas.klas@mepatek.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
David Chinner	6ab8eb1cff	[PATCH] Make XFS use BH_Unwritten and BH_Delay correctly Don't hide buffer_unwritten behind buffer_delay() and remove the hack that clears unexpected buffer_unwritten() states now that it can't happen. Signed-off-by: Dave Chinner <dgc@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Timothy Shimmin <tes@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
David Chinner	33a266dda9	[PATCH] Make BH_Unwritten a first class bufferhead flag V2 Currently, XFS uses BH_PrivateStart for flagging unwritten extent state in a bufferhead. Recently, I found the long standing mmap/unwritten extent conversion bug, and it was to do with partial page invalidation not clearing the unwritten flag from bufferheads attached to the page but beyond EOF. See here for a full explaination: http://oss.sgi.com/archives/xfs/2006-12/msg00196.html The solution I have checked into the XFS dev tree involves duplicating code from block_invalidatepage to clear the unwritten flag from the bufferhead(s), and then calling block_invalidatepage() to do the rest. Christoph suggested that this would be better solved by pushing the unwritten flag into the common buffer head flags and just adding the call to discard_buffer(): http://oss.sgi.com/archives/xfs/2006-12/msg00239.html The following patch makes BH_Unwritten a first class citizen. Signed-off-by: Dave Chinner <dgc@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:27 -08:00
Linus Torvalds	958b7f37ee	Merge git://oss.sgi.com:8090/xfs/xfs-2.6 * git://oss.sgi.com:8090/xfs/xfs-2.6: (33 commits) [XFS] Don't use kmap in xfs_iozero. [XFS] Remove a bunch of unused functions from XFS. [XFS] Remove unused arguments from the XFS_BTREE_*_ADDR macros. [XFS] Remove unused header files for MAC and CAP checking functionality. [XFS] Make freeze code a little cleaner. [XFS] Remove unused argument to xfs_bmap_finish [XFS] Clean up use of VFS attr flags [XFS] Remove useless memory barrier [XFS] XFS sysctl cleanups [XFS] Fix assertion in xfs_attr_shortform_remove(). [XFS] Fix callers of xfs_iozero() to zero the correct range. [XFS] Ensure a frozen filesystem has a clean log before writing the dummy [XFS] Fix sub-block zeroing for buffered writes into unwritten extents. [XFS] Re-initialize the per-cpu superblock counters after recovery. [XFS] Fix block reservation changes for non-SMP systems. [XFS] Fix block reservation mechanism. [XFS] Make growfs work for amounts greater than 2TB [XFS] Fix inode log item use-after-free on forced shutdown [XFS] Fix attr2 corruption with btree data extents [XFS] Workaround log space issue by increasing XFS_TRANS_PUSH_AIL_RESTARTS ...	2007-02-11 11:53:39 -08:00
Linus Torvalds	c827ba4cb4	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Update defconfig. [SPARC64]: Add PCI MSI support on Niagara. [SPARC64] IRQ: Use irq_desc->chip_data instead of irq_desc->handler_data [SPARC64]: Add obppath sysfs attribute for SBUS and PCI devices. [PARTITION]: Add whole_disk attribute.	2007-02-11 11:37:45 -08:00
Alexey Dobriyan	4b98d11b40	[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct They are fat: 4x8 bytes in task_struct. They are uncoditionally updated in every fork, read, write and sendfile. They are used only if you have some "extended acct fields feature". And please, please, please, read(2) knows about bytes, not characters, why it is called "rchar"? Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:07 -08:00
Dmitriy Monakhov	3e4fdaf8ae	[PATCH] jbd layer function called instead of fs specific one jbd function called instead of fs specific one. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:06 -08:00
Robert P. J. Day	730c385bc5	[PATCH] Remove unused kernel config option ZISOFS_FS Remove the kernel config option ZISOFS_FS, since it appears that the actual option is simply ZISOFS. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:18:06 -08:00
Nick Piggin	72ed3d0358	[PATCH] buffer: memorder fix unlock_buffer(), like unlock_page(), must not clear the lock without ensuring that the critical section is closed. Mingming later sent the same patch, saying: We are running SDET benchmark and saw double free issue for ext3 extended attributes block, which complains the same xattr block already being freed (in ext3_xattr_release_block()). The problem could also been triggered by multiple threads loop untar/rm a kernel tree. The race is caused by missing a memory barrier at unlock_buffer() before the lock bit being cleared, resulting in possible concurrent h_refcounter update. That causes a reference counter leak, then later leads to the double free that we have seen. Inside unlock_buffer(), there is a memory barrier is placed after the lock bit is being cleared, however, there is no memory barrier before the bit is cleared. On some arch the h_refcount update instruction and the clear bit instruction could be reordered, thus leave the critical section re-entered. The race is like this: For example, if the h_refcount is initialized as 1, cpu 0: cpu1 -------------------------------------- ----------------------------------- lock_buffer() /* test_and_set_bit / clear_buffer_locked(bh); lock_buffer() / test_and_set_bit / h_refcount = h_refcount+1; / = 2/ h_refcount = h_refcount + 1; /= 2 */ clear_buffer_locked(bh); .... ...... We lost a h_refcount here. We need a memory barrier before the buffer head lock bit being cleared to force the order of the two writes. Please apply. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 11:15:24 -08:00
Robert P. J. Day	82ddcb0405	[PATCH] extend the set of "__attribute__" shortcut macros Extend the set of "__attribute__" shortcut macros, and remove identical (and now superfluous) definitions from a couple of source files. based on a page at robert love's blog: http://rlove.org/log/2005102601 extend the set of shortcut macros defined in compiler-gcc.h with the following: #define __packed __attribute__((packed)) #define __weak __attribute__((weak)) #define __naked __attribute__((naked)) #define __noreturn __attribute__((noreturn)) #define __pure __attribute__((pure)) #define __aligned(x) __attribute__((aligned(x))) #define __printf(a,b) __attribute__((format(printf,a,b))) Once these are in place, it's up to subsystem maintainers to decide if they want to take advantage of them. there is already a strong precedent for using shortcuts like this in the source tree. The ones that might give people pause are "__aligned" and "__printf", but shortcuts for both of those are already in use, and in some ways very confusingly. note the two very different definitions for a macro named "ALIGNED": drivers/net/sgiseeq.c:#define ALIGNED(x) ((((unsigned long)(x)) + 0xf) & ~(0xf)) drivers/scsi/ultrastor.c:#define ALIGNED(x) __attribute__((aligned(x))) also: include/acpi/platform/acgcc.h: #define ACPI_PRINTF_LIKE(c) __attribute__ ((__format__ (__printf__, c, c+1))) Given the precedent, then, it seems logical to at least standardize on a consistent set of these macros. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:35 -08:00
Eric Sandeen	731b9a5498	[PATCH] remove ext[34]_inc_count and _dec_count - Naming is confusing, ext3_inc_count manipulates i_nlink not i_count - handle argument passed in is not used - ext3 and ext4 already call inc_nlink and dec_nlink directly in other places Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Eric Sandeen	2988a7740d	[PATCH] return ENOENT from ext3_link when racing with unlink Return -ENOENT from ext[34]_link if we've raced with unlink and i_nlink is 0. Doing otherwise has the potential to corrupt the orphan inode list, because we'd wind up with an inode with a non-zero link count on the list, and it will never get properly cleaned up & removed from the orphan list before it is freed. [akpm@osdl.org: build fix] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Hugh Dickins	2e7842b887	[PATCH] fix umask when noACL kernel meets extN tuned for ACLs Fix insecure default behaviour reported by Tigran Aivazian: if an ext2 or ext3 or ext4 filesystem is tuned to mount with "acl", but mounted by a kernel built without ACL support, then umask was ignored when creating inodes - though root or user has umask 022, touch creates files as 0666, and mkdir creates directories as 0777. This appears to have worked right until 2.6.11, when a fix to the default mode on symlinks (always 0777) assumed VFS applies umask: which it does, unless the mount is marked for ACLs; but ext[234] set MS_POSIXACL in s_flags according to s_mount_opt set according to def_mount_opts. We could revert to the 2.6.10 ext[234]_init_acl (adding an S_ISLNK test); but other filesystems only set MS_POSIXACL when ACLs are configured. We could fix this at another level; but it seems most robust to avoid setting the s_mount_opt flag in the first place (at the expense of more ifdefs). Likewise don't set the XATTR_USER flag when built without XATTR support. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: <linux-ext4@vger.kernel.org> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Alexey Dobriyan	9bbf81e483	[PATCH] seq_file conversion: coda Compile-tested. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <jaharkes@cs.cmu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Eric Sandeen	ead6596b9e	[PATCH] ext4: refuse ro to rw remount of fs with orphan inodes In the rare case where we have skipped orphan inode processing due to a readonly block device, and the block device subsequently changes back to read-write, disallow a remount,rw transition of the filesystem when we have an unprocessed orphan inodes as this would corrupt the list. Ideally we should process the orphan inode list during the remount, but that's trickier, and this plugs the hole for now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Eric Sandeen	ea9a05a133	[PATCH] ext3: refuse ro to rw remount of fs with orphan inodes In the rare case where we have skipped orphan inode processing due to a readonly block device, and the block device subsequently changes back to read-write, disallow a remount,rw transition of the filesystem when we have an unprocessed orphan inodes as this would corrupt the list. Ideally we should process the orphan inode list during the remount, but that's trickier, and this plugs the hole for now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:34 -08:00
Andrew Morton	100bb9349e	[PATCH] proc_misc warning fix fs/proc/proc_misc.c: In function 'proc_misc_init': fs/proc/proc_misc.c:764: warning: unused variable 'entry' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Olaf Hering	a470e18f53	[PATCH] msdos partitions: fix logic error in AIX detection Correct the AIX magic check to let 'echo > /dev/sdb' actually work. Signed-off-by: Olaf Hering <olh@suse.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Olaf Hering	4419d1ac7d	[PATCH] relax check for AIX in msdos partition table The patch to identify AIX disks and ignore them has caused at least one machine to fail to find the root partition on 2.6.19. The patch is: http://lkml.org/lkml/2006/7/31/117 The problem is some disk formatters do not blow away the first 4 bytes of the disk. If the disk we are installing to used to have AIX on it, then the first 4 bytes will still have IBMA in EBCDIC. The install in question was debian etch. Im not sure what the best fix is, perhaps the AIX detection code could check more than the first 4 bytes. The whole partition info for primary partitions is in this block: dd if=/dev/sdb count=$(( 4 * 16 )) bs=1 skip=$(( 0x1be )) All other data do not matter, beside the 0x55aa marker at the end of the first block. Signed-off-by: Olaf Hering <olh@suse.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Andrew Morton	fc0ecff698	[PATCH] remove invalidate_inode_pages() Convert all calls to invalidate_inode_pages() into open-coded calls to invalidate_mapping_pages(). Leave the invalidate_inode_pages() wrapper in place for now, marked as deprecated. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:31 -08:00
Eric Sandeen	d8adb9cef7	[PATCH] ext2: skip pages past number of blocks in ext2_find_entry This one was pointed out on the MOKB site: http://kernelfun.blogspot.com/2006/11/mokb-09-11-2006-linux-26x-ext2checkpage.html If a directory's i_size is corrupted, ext2_find_entry() will keep processing pages until the i_size is reached, even if there are no more blocks associated with the directory inode. This patch puts in some minimal sanity-checking so that we don't keep checking pages (and issuing errors) if we know there can be no more data to read, based on the block count of the directory inode. This is somewhat similar in approach to the ext3 patch I sent earlier this year. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:28 -08:00
Robert P. J. Day	c376222960	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:27 -08:00
Jan Blunck	4a3b0a490d	[PATCH] igrab() should check for I_CLEAR When igrab() is calling __iget() on an inode it should check if clear_inode() has been called on the inode already. Otherwise there is a race window between clear_inode() and destroy_inode() where igrab() calls __iget() which leads to already free inodes on the inode lists. Signed-off-by: Vandana Rungta <vandana@novell.com> Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:26 -08:00
Eric Dumazet	37756ced1f	[PATCH] avoid one conditional branch in touch_atime() I added IS_NOATIME(inode) macro definition in include/linux/fs.h, true if the inode superblock is marked readonly or noatime. This new macro is then used in touch_atime() instead of separatly testing MS_RDONLY and MS_NOATIME Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:25 -08:00
Ken Chen	4662629631	[PATCH] convert ramfs to use __set_page_dirty_no_writeback As pointed out by Hugh, ramfs would also benefit from using the new set_page_dirty aop method for memory backed file systems. Signed-off-by: Ken Chen <kenchen@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:19 -08:00
Christoph Lameter	65e458d43d	[PATCH] Drop get_zone_counts() Values are available via ZVC sums. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:18 -08:00
Robert P. J. Day	e10a4437cb	[PATCH] Remove final references to deprecated "MAP_ANON" page protection flag Remove the last vestiges of the long-deprecated "MAP_ANON" page protection flag: use "MAP_ANONYMOUS" instead. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:17 -08:00
Fabio Massimo Di Nitto	d18d7682c1	[PARTITION]: Add whole_disk attribute. Some partitioning systems create special partitions that span the entire disk. One example are Sun partitions, and this whole-disk partition exists to tell the firmware the extent of the entire device so it can load the boot block and do other things. Such partitions should not be treated as normal partitions, because all the other partitions overlap this whole-disk one. So we'd see multiple instances of the same UUID etc. which we do not want. udev and friends can thus search for this 'whole_disk' attribute and use it to decide to ignore the partition. Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:50:00 -08:00
David Chinner	e7ff6aed87	[XFS] Don't use kmap in xfs_iozero. kmap() is inefficient and does not scale well. kmap_atomic() is a better choice. Use the generic wrapper function instead of open coding the kmap-memset-dcache flush-kunmap stuff. SGI-PV: 960904 SGI-Modid: xfs-linux-melb:xfs-kern:28041a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:46 +11:00
Eric Sandeen	6be145bfb1	[XFS] Remove a bunch of unused functions from XFS. Patch provided by Eric Sandeen (sandeen@sandeen.net). SGI-PV: 960897 SGI-Modid: xfs-linux-melb:xfs-kern:28038a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-02-10 18:37:40 +11:00

1 2 3 4 5 ...

4888 Commits