Commit Graph

168837 Commits

Author SHA1 Message Date
Jiro SEKIBA
565de406e7 nilfs2: expand inode_inc_link_count and inode_dec_link_count
This is an intermidiate patch to reduce redandunt mark_inode_dirty() calls
by calling inode_inc_link_count() and inode_dec_link_count() functions.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:15 +09:00
Jiro SEKIBA
43f8bc262f nilfs2: delete mark_inode_dirty from nilfs_set_link
Delete mark_inode_dirty() from nilfs_set_link() to reduce redundant
mark_inode_dirty() calls in caller of nilfs_set_link().

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:15 +09:00
Jiro SEKIBA
9ca941d4b6 nilfs2: delete mark_inode_dirty in nilfs_new_inode
It is redundant to call mark_inode_dirty() in nilfs_new_inode() because
all caller of nilfs_new_inode() will call mark_inode_dirty()
after calling nilfs_new_inode() directly or indirectly in transaction.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-27 20:05:15 +09:00
Ryusuke Konishi
0234576d04 nilfs2: add norecovery mount option
This adds "norecovery" mount option which disables temporal write
access to read-only mounts or snapshots during mount/recovery.
Without this option, write access will be even performed for those
types of mounts; the temporal write access is needed to mount root
file system read-only after an unclean shutdown.

This option will be helpful when user wants to prevent any write
access to the device.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Eric Sandeen <sandeen@redhat.com>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi
a057d2c011 nilfs2: add helper to get if volume is in a valid state
This adds a helper function, nilfs_valid_fs() which returns if nilfs
is in a valid state or not.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi
f50a4c8149 nilfs2: move recovery completion into load_nilfs function
Although mount recovery of nilfs is integrated in load_nilfs()
procedure, the completion of recovery was isolated from the procedure
and performed at the end of the fill_super routine.

This was somewhat confusing since the recovery is needed for the nilfs
object, not for a super block instance.

To resolve the inconsistency, this will integrate the recovery
completion into load_nilfs().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi
050b4142c9 nilfs2: apply readahead for recovery on mount
This inserts readahead in the recovery code.  The readahead request is
issued per segment while searching the latest super root block.

This will shorten mount time after unclean unmount.  A measurement
shows the recovery time was reduced by more than 60 percent:

 e.g. real  0m11.586s -> 0m3.918s  (x 2.96)

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
f021759d74 nilfs2: clean up get/put function of a segment usage
This eliminates obsolete nilfs_get_sufile_get_segment_usage() and
nilfs_set_sufile_segment_usage() from sufile.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
071ec54dd7 nilfs2: move routine to set segment usage into sufile
This adds nilfs_sufile_set_segment_usage() function in sufile to
replace direct access to the sufile metadata in log writer code.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
61a189e9c6 nilfs2: move routine marking segment usage dirty into sufile
This adds nilfs_sufile_mark_dirty() function in sufile to replace
nilfs_touch_segusage() function in log writer code.  This is a
preparation for the further cleanup which will move out low level
sufile operations in the log writer.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
70622a2091 nilfs2: insert cache operation in palloc get block routines
This implements cache operation in get block routines of palloc code:
nilfs_palloc_get_desc_block(), nilfs_palloc_get_bitmap_block(), and
nilfs_palloc_get_entry_block().

This will complete the palloc cache.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
49fa7a5902 nilfs2: add palloc cache to ifile
This adds the palloc cache to ifile.  The palloc cache is allocated on
the extended region of nilfs_mdt_info struct.  The struct
nilfs_ifile_info defines the extended on memory structure of ifile.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
c3ea56c800 nilfs2: flush palloc cache before manipulating data pages of GC dat
Data pages in gcdat metadata file (i.e. the secondary DAT for GC), are
cleared or even moved back to the normal DAT when a shot of garbage
collection was done.

Buffer heads held by the palloc cache of gcdat must be cleared before
these page cache manipulation.  This adds nilfs_palloc_clear_cache()
to ensure this.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
8908b2f70b nilfs2: add palloc cache to dat
This adds the palloc cache to DAT file.  The palloc cache is allocated
on the extended region of nilfs_mdt_info struct.  The struct
nilfs_dat_info defines the extended on memory structure of DAT.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
db38d5ad32 nilfs2: add cache framework for persistent object allocator
This adds setup and cleanup routines of the persistent object
allocator cache.

According to ftrace analyses, accessing buffers of the DAT file
suffers indispensable overhead many times.  To mitigate the overhead,
This introduce cache framework for the persistent object allocator
(palloc) which the DAT file and ifile are using.

struct nilfs_palloc_cache represents the cache object per metadata
file using palloc.

The cache is initialized through nilfs_palloc_setup_cache() and
destroyed by nilfs_palloc_destroy_cache(); callers of the former
function will be added to individual allocators of DAT and ifile on
successive patches.

nilfs_palloc_destroy_cache() will be called from nilfs_mdt_destroy()
if the cache is attached to a metadata file.  A companion function
nilfs_palloc_clear_cache() is provided to allow releasing buffer head
references independently with the cleanup task.  This adjunctive
function will be used before invalidating pages of metadata file with
the cache.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
141bbdba9c nilfs2: unfold nilfs_palloc_block_get_bitmap function
This expands a trivial address calculation in the function into its
every callsite. This expansion improves readability of the callers.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
1376e931b7 nilfs2: eliminate nilfs_btnode_get function
This removes the obsolete nilfs_btnode_get() function and makes
nilfs_btree_get_block() directly call nilfs_btnode_submit_block().

This expansion will provide better opportunity for code optimization.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
75f65edfcc nilfs2: remove newblk argument from nilfs_btnode_submit_block
This removes the obsolete argument from nilfs_btnode_submit_block().
This will complete separating a create function of btree node.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
45f4910bc0 nilfs2: use nilfs_btnode_create_block function
This displaces nilfs_btnode_get() use to create new btree node block
with nilfs_btnode_create_block.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi
d501d73689 nilfs2: separate function for creating new btree node block
Adds a separate routine for creating a btree node block.  This is a
preparation to reduce the depth of function calls during submitting
btree node buffer.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi
b34a65069c nilfs2: avoid readahead on metadata file for create mode
This turns off readhead action of metadata file if nilfs_mdt_get_block
function was called with a create flag.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi
ef7d4757a5 nilfs2: simplify nilfs_sufile_get_ncleansegs function
Previously, this function took an status code to return possible error
codes.  The ("nilfs2: add local variable to cache the number of clean
segments") patch removed the possibility to return errors.

So, this simplifies the function definition to make it directly return
the number of clean segments.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi
aa474a2201 nilfs2: add local variable to cache the number of clean segments
This makes it possible for sufile to get the number of clean segments
faster.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi
7b16c8a211 nilfs2: unfold nilfs_sufile_block_get_header function
This unfolds the nilfs_sufile_block_get_header() function for
simplicity.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi
fd66c0d5c3 nilfs2: hide nilfs_mdt_clear calls in nilfs_mdt_destroy
This will hide a function call of nilfs_mdt_clear() in
nilfs_mdt_destroy().

This ensures nilfs_mdt_destroy() to do cleanup jobs included in
nilfs_mdt_clear().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi
3961f0e277 nilfs2: eliminate inlines to directly read/write inode of metadata files
Removes two inline functions: nilfs_mdt_read_inode_direct() and
nilfs_mdt_write_inode_direct().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi
8707df3847 nilfs2: separate read method of meta data files on super root block
Will displace nilfs_mdt_read_inode_direct function with an individual
read method: nilfs_dat_read, nilfs_sufile_read, nilfs_cpfile_read.

This provides the opportunity to initialize local variables of each
metadata file after reading the inode.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi
79739565e1 nilfs2: separate constructor of metadata files
This will displace nilfs_mdt_new() constructor with individual
metadata file constructors like nilfs_dat_new(), new_sufile_new(),
nilfs_cpfile_new(), and nilfs_ifile_new().

This makes it possible for each metadata file to have own
intialization code.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi
5731e191f2 nilfs2: add size option of private object to metadata file allocator
This adds an optional "object size" argument to nilfs_mdt_new_common()
function; the argument specifies the size of private object attached
to a newly allocated metadata file inode.

This will afford space to keep local variables for meta data files.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:48 +09:00
Ryusuke Konishi
9cb4e0d2b9 nilfs2: move out mark_inode_dirty calls from bmap routines
Previously, nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() called
mark_inode_dirty() after they changed the number of data blocks.

This moves these calls outside bmap outermost functions like
nilfs_bmap_insert() or nilfs_bmap_truncate().

This will mitigate overhead for truncate or delete operation since
they repeatedly remove set of blocks.  Nearly 10 percent improvement
was observed for removal of a large file:

 # dd if=/dev/zero of=/test/aaa bs=1M count=512
 # time rm /test/aaa

  real  2.968s -> 2.705s

Further optimization may be possible by eliminating these
mark_inode_dirty() uses though I avoid mixing separate changes here.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Ryusuke Konishi
09bf4aae0a nilfs2: stop marking metadata inode dirty within btree operations
Since metadata file routines mark the inode dirty after they
successfully changed bmap objects, nilfs_mdt_mark_dirty() calls in
nilfs_bmap_add_blocks() and nilfs_bmap_sub_blocks() are redundant.

This removes these overlapping calls from the bmap routines.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Ryusuke Konishi
30db4e6c3d nilfs2: remove buffer locking from btree code
lock_buffer() and unlock_buffer() uses in btree.c are eliminable
because btree functions gain buffer heads through nilfs_btnode_get(),
which never returns an on-the-fly buffer.

Although nilfs_clear_dirty_page() and nilfs_copy_back_pages() in
nilfs_commit_gcdat_inode() juggle btree node buffers of DAT, this is
safe because these operations are protected by a log writer lock or
the metadata file semaphore of DAT.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Ryusuke Konishi
a49762fd11 nilfs2: remove buffer locking in nilfs_mark_inode_dirty
This lock is eliminable because inodes on the buffer can be updated
independently.  Although a log writer also fills in bmap data on the
on-disk inodes, this update is exclusively done by a log writer lock.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Jiro SEKIBA
e2073e7857 nilfs2: cleanup unused match_bool function
match_bool function is not used anymore.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Jiro SEKIBA
91f1953bf3 nilfs2: Using nobarrier option instead of barrier=off
Since most of fs using nofoobar style option,
modified barrier=off option as nobarrier.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Jiro SEKIBA
6600b9dd8e nilfs2: move definition of struct nilfs_btree_node
This is a trivial patch to expose struct nilfs_fs_btree_node.
The struct should be exposed outside of kernel, for it is disk format.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:46 +09:00
Ryusuke Konishi
9b945d537d nilfs2: get rid of BUG_ON use in btree lookup routines
The current btree lookup routines make a kernel oops when detected
inconsistency in btree blocks.  These routines should instead return a
proper error code because the inconsistency usually comes from
corruption of on-disk metadata.

This fixes the issue by converting BUG_ON calls to proper error
handlings.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:46 +09:00
Linus Torvalds
648f4e3e50 Linux 2.6.32-rc8 2009-11-19 14:32:38 -08:00
Linus Torvalds
e6236f781c Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  SUNRPC: Address buffer overrun in rpc_uaddr2sockaddr()
  NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
2009-11-19 13:43:19 -08:00
Alan Cox
308efab5e2 vt: Fix use of "new" in a struct field
As this struct is exposed to user space and the API was added for this
release it's a bit of a pain for the C++ world and we still have time to
fix it. Rename the fields before we end up with that pain in an actual
release.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Reported-by: Olivier Goffart
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-19 13:43:06 -08:00
David Woodhouse
5854d9c8d1 Fix handling of the HP/Acer 'DMAR at zero' BIOS error for machines with <4GiB RAM.
Commit 86cf898e1d ("intel-iommu: Check for
'DMAR at zero' BIOS error earlier.") was supposed to work by pretending
not to detect an IOMMU if it was actually being reported by the BIOS at
physical address zero.

However, the intel_iommu_init() function is called unconditionally, as
are the corresponding functions for other IOMMU hardware.

So the patch only worked if you have RAM above the 4GiB boundary. It
caused swiotlb to be initialised when no IOMMU was detected during early
boot, and thus the later IOMMU init would refuse to run.

But if you have less RAM than that, swiotlb wouldn't get set up and the
IOMMU _would_ still end up being initialised, even though we never
claimed to detect it.

This patch also sets the dmar_disabled flag when the error is detected
during the initial detection phase -- so that the later call to
intel_iommu_init() will return without doing anything, regardless of
whether swiotlb is used or not.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-19 13:42:02 -08:00
Linus Torvalds
66b00a7c93 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix stale cpufreq_cpu_governor pointer
  [CPUFREQ] Resolve time unit thinko in ondemand/conservative govs
  [CPUFREQ] speedstep-ich: fix error caused by 394122ab14
  [CPUFREQ] Fix use after free on governor restore
  [CPUFREQ] acpi-cpufreq: blacklist Intel 0f68: Fix HT detection and put in notification message
  [CPUFREQ] powernow-k8: Fix test in get_transition_latency()
  [CPUFREQ] longhaul: select Longhaul version 2 for capable CPUs
2009-11-18 18:49:49 -08:00
Linus Torvalds
a414f01ac2 strcmp: fix overflow and possibly signedness error
Doing the strcmp return value as

	signed char __res = *cs - *ct;

is wrong for two reasons.  The subtraction can overflow because __res
doesn't use a type big enough.  Moreover the compared bytes should be
interpreted as unsigned char as specified by POSIX.

The same problem is fixed in strncmp.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-18 17:18:13 -08:00
Linus Torvalds
6602b355c2 Merge branch 'agp-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6
* 'agp-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
  agp/intel-agp: Set dma_mask for capable chipsets before agp_add_bridge()
2009-11-18 17:08:16 -08:00
Linus Torvalds
7f6f3507fd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  ima: replace GFP_KERNEL with GFP_NOFS
2009-11-18 16:58:34 -08:00
David Woodhouse
ec402ba97a agp/intel-agp: Set dma_mask for capable chipsets before agp_add_bridge()
We should set this before calling agp_add_bridge() so that it's done
before we map the scratch page too.

This should probably fix the regression reported as k.o. bug #14627.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-11-19 10:34:30 +10:00
Linus Torvalds
d22966d067 Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  OMAP: cs should be positive in gpmc_cs_free()
  omap: fix unlikely(x) < y
  omap3: clock: Fixed dpll3_m2x2 rate calculation
  omap3: clock: Fix the DPLL freqsel computations
  omap: Fix keymap for zoom2 according to matrix keypad framwork
2009-11-18 15:00:21 -08:00
Linus Torvalds
70b172b298 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ASoC: tlv320aic23 fix rate selection
  ASoC: OMAP3 Pandora: update for TWL4030 codec changes
  ASoC: Modifying the license string GPLv2 for OMAP3 EVM
  ALSA: hda - Fix quirk for VAIO type G
  ALSA: usb - Quirk to disable master volume control in PCM2702
2009-11-18 14:59:49 -08:00
Linus Torvalds
486bfe5c7c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  cxgb3: fix premature page unmap
  ibm_newemac: Fix EMACx_TRTR[TRT] bit shifts
  vlan: Fix register_vlan_dev() error path
  gro: Fix illegal merging of trailer trash
  sungem: Fix Serdes detection.
  net: fix mdio section mismatch warning
  ppp: fix BUG on non-linear SKB (multilink receive)
  ixgbe: Fixing EEH handler to handle more than one error
  net: Fix the rollback test in dev_change_name()
  Revert "isdn: isdn_ppp: Use SKB list facilities instead of home-grown implementation."
  TI Davinci EMAC : Fix Console Hang when bringing the interface down
  smsc911x: Fix Console Hang when bringing the interface down.
  mISDN: fix error return in HFCmulti_init()
  forcedeth: mac address fix
  r6040: fix version printing
  Bluetooth: Fix regression with L2CAP configuration in Basic Mode
  Bluetooth: Select Basic Mode as default for SOCK_SEQPACKET
  Bluetooth: Set general bonding security for ACL by default
  r8169: Fix receive buffer length when MTU is between 1515 and 1536
  can: add the missing netlink get_xstats_size callback
  ...
2009-11-18 14:54:45 -08:00
Mimi Zohar
c09c59e6a0 ima: replace GFP_KERNEL with GFP_NOFS
While running fsstress tests on the NFSv4 mounted ext3 and ext4
filesystem, the following call trace was generated on the nfs
server machine.

Replace GFP_KERNEL with GFP_NOFS in ima_iint_insert() to avoid a
potential deadlock.

     =================================
    [ INFO: inconsistent lock state ]
    2.6.31-31.el6.x86_64 #1
    ---------------------------------
    inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
    kswapd2/75 [HC0[0]:SC0[0]:HE1:SE1] takes:
     (jbd2_handle){+.+.?.}, at: [<ffffffff811edd5e>] jbd2_journal_start+0xfe/0x13f
    {RECLAIM_FS-ON-W} state was registered at:
      [<ffffffff81091e40>] mark_held_locks+0x65/0x99
      [<ffffffff81091f31>] lockdep_trace_alloc+0xbd/0xf5
      [<ffffffff81126fdd>] kmem_cache_alloc+0x40/0x185
      [<ffffffff812344d7>] ima_iint_insert+0x3d/0xf1
      [<ffffffff812345b0>] ima_inode_alloc+0x25/0x44
      [<ffffffff811484ac>] inode_init_always+0xec/0x271
      [<ffffffff81148682>] alloc_inode+0x51/0xa1
      [<ffffffff81148700>] new_inode+0x2e/0x94
      [<ffffffff811b2f08>] ext4_new_inode+0xb8/0xdc9
      [<ffffffff811be611>] ext4_create+0xcf/0x175
      [<ffffffff8113e2cd>] vfs_create+0x82/0xb8
      [<ffffffff8113f337>] do_filp_open+0x32c/0x9ee
      [<ffffffff811309b9>] do_sys_open+0x6c/0x12c
      [<ffffffff81130adc>] sys_open+0x2e/0x44
      [<ffffffff81011e42>] system_call_fastpath+0x16/0x1b
      [<ffffffffffffffff>] 0xffffffffffffffff
    irq event stamp: 90371
    hardirqs last  enabled at (90371): [<ffffffff8112708d>]
    kmem_cache_alloc+0xf0/0x185
    hardirqs last disabled at (90370): [<ffffffff81127026>]
    kmem_cache_alloc+0x89/0x185
    softirqs last  enabled at (89492): [<ffffffff81068ecf>]
    __do_softirq+0x1bf/0x1eb
    softirqs last disabled at (89477): [<ffffffff8101312c>] call_softirq+0x1c/0x30

    other info that might help us debug this:
    2 locks held by kswapd2/75:
     #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810f98ba>] shrink_slab+0x44/0x177
     #1:  (&type->s_umount_key#25){++++..}, at: [<ffffffff811450ba>]

Reported-by: Muni P. Beerakam <mbeeraka@in.ibm.com>
Reported-by: Amit K. Arora <amitarora@in.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-11-19 08:42:01 +11:00