linux

Author	SHA1	Message	Date
Johan Rudholm	d887874e0e	mmc: core: Add card_busy to host_ops This host_ops member is used to test if the card is signaling busy by pulling dat[0:3] low. Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Wei WANG <wei_wang@realsil.com.cn> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:07 -05:00
Johan Rudholm	276e090f92	mmc: core: Add mmc_power_cycle Add mmc_power_cycle which can be used to power cycle for instance SD-cards. Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Wei WANG <wei_wang@realsil.com.cn> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:06 -05:00
Johan Rudholm	3f8a7fabd6	mmc: sd: Simplify by using mmc_host_uhs Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:05 -05:00
Balaji T K	d0123ccac5	mmc: core: expose RPMB partition only for CMD23 capable hosts SET_BLOCK_COUNT CMD23 is needed for all access to RPMB partition. If block count is not set by CMD23, all subsequent read/write commands fail as per eMMC specification. So, If the host does not support CMD23, do not expose RPMB partition. Accessing RPMB partition can cause hang / huge delay for hosts which do not support CMD23. Signed-off-by: Balaji T K <balajitk@ti.com> Reported-and-Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:05 -05:00
Mike Lockwood	85c34d2e7b	mmc: goldfish: emulated MMC device This driver handles the virtual MMC device present in the Goldfish emulator. The patch folds together initial work from Mike Lockwood and patches by San Mehat, Jun Nakajima and Tom Keel <thomas.keel@intel.com> plus cleanups by Alan Cox to get it all into 3.6 shape. Signed-off-by: Mike A. Chan <mikechan@google.com> [cleaned up and x86 support added] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Xiaohui Xin <xiaohui.xin@intel.com> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Bruce Beare <bruce.j.beare@intel.com> [Moved to 3.4] Signed-off-by: Tom Keel <thomas.keel@intel.com> [Moved to 3.7] Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:04 -05:00
Shawn Guo	28c2a62bd5	mmc: dt: bus-width can be an optional property None of mmc drivers implements bus-width as a required device tree property. Instead, some drivers like atmel-mci, dw_mmc, sdhci-s3c implement it as an optional one, and will force bus width to be 1 when the property is absent. Let's change the common binding to reflect what the drivers are usually doing. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:04 -05:00
Sascha Hauer	af51079e68	mmc: sdhci-esdhc-imx: support 8bit mode The i.MX esdhc has a nonstandard bit layout for the SDHCI_HOST_CONTROL register. To support 8bit bus width on i.MX populate the platform_bus_width callback. This is tested on an i.MX25, but should according to the datasheets work on the other i.MX using this hardware aswell. The i.MX6, while having a SDHCI_SPEC_300 controller, still uses the same nonstandard register layout. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:03 -05:00
Sascha Hauer	7bc088d38f	mmc: sdhci: rename platform_8bit_width to platform_bus_width The 8bit in the function name is misleading. When set, it will be used to set the bus width, regardless of whether 8bit or another bus width is requested, so change the function name to platform_bus_width. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:02 -05:00
Shawn Guo	2a15f981ae	mmc: sdhci-esdhc-imx: Auto CMD23 support for usdhc SDHCI core will try to use Auto CMD23 for mmc card. Currently, we will see the following message with mmc card on usdhc due to the lacking of Auto CMD23 support in the driver. $ mmc0: new high speed MMC card at address 0001 mmcblk1: mmc0:0001 MMC02G 1.87 GiB mmcblk1: error -84 transferring data, sector 0, nr 8, cmd response 0x900, card status 0xb00 mmcblk1: retrying using single block read mmcblk1: Enable Auto CMD23 support for usdhc so that mmc card can work in multiple block mode. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:02 -05:00
Shawn Guo	58c8c4fbdb	mmc: sdhci-esdhc-imx: manually reset MIX_CTRL for usdhc It's another violation to SDHC spec that software reset on usdhc does not reset MIX_CTRL register. Have to do it manually, otherwise the preserving of the register bits (e.g. AC23EN) may cause mmc card fail to be initialized. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:01 -05:00
Shawn Guo	69f5469822	mmc: sdhci-esdhc-imx: separate transfer mode from command write for usdhc The combining of SDHCI_TRANSFER_MODE and SDHCI_COMMAND writes is only required for esdhc, but not necessarily for usdhc. Different from esdhc where the bits for transfer mode and command are all in the same register CMD_XFR_TYP, usdhc has a newly introduced register MIX_CTRL to hold transfer mode bits. So it makes more sense to separate transfer mode from command write for usdhc. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:01 -05:00
Maya Erez	881d926d9d	mmc: core: move the cache disabling operation to mmc_suspend Cache control is an eMMC feature and in therefore should be part of MMC's bus resume operations, performed in mmc_suspend, rather than in the generic mmc_suspend_host(). Signed-off-by: Maya Erez <merez@codeaurora.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:37:00 -05:00
Seungwon Jeon	f9e37137e4	MAINTAINERS: mmc: add maintainer entry for dw_mmc driver Add maintainer entry for the Synopsys DW host driver which is used in various SOC including EXYNOS series. As Will Newton will no longer be able to take care of dw_mmc*, I and Jaehoon Chung are willing to maintain it. Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Acked-by: Will Newton <will.newton@imgtec.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:59 -05:00
Fabio Estevam	da86a5d4ef	mmc: sdhci-esdhc-imx: Remove unused variables 3f175a6e5 (mmc: sdhci-esdhc-imx: remove ESDHC_CD_GPIO handling from IO accessory) introduced the following build warnings: drivers/mmc/host/sdhci-esdhc-imx.c:149:30: warning: unused variable 'boarddata' [-Wunused-variable] drivers/mmc/host/sdhci-esdhc-imx.c:181:30: warning: unused variable 'boarddata' [-Wunused-variable] Remove the unused variables. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:59 -05:00
Guennadi Liakhovetski	b477426e37	mmc: fix DT binding documentation SDHCI left-over The file Documentation/devicetree/bindings/mmc/mmc.txt is common for all MMC host drivers. Use a generic MMC host reference instead of an SDHCI left-over. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:58 -05:00
Shawn Guo	60bf6396fb	mmc: sdhci-esdhc-imx: name esdhc specific definitions with ESDHC_ prefix Rename esdhc local definitions with ESDHC_ rather than SDHCI_ prefix, so that we can distinguish them from SDHCI core definitions from name. A couple of bit fields are also changed use shift for consistency and better readability. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:58 -05:00
Shawn Guo	6b40d18295	mmc: sdhci-esdhc-imx: remove D3CD check from SDHCI_HOST_CONTROL write SDHCI_CTRL_D3CD is not a standard SDHCI_HOST_CONTROL, so there is no need to check it in SDHCI_HOST_CONTROL write at all. Remove it. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Dirk Behme <dirk.behme@de.bosch.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:57 -05:00
Shawn Guo	ef4d0888bb	mmc: sdhci-esdhc-imx: fix host version read When commit `95a2482` (mmc: sdhci-esdhc-imx: add basic imx6q usdhc support) works around host version issue on imx6q, it gets the register address fixup "reg ^= 2" lost for imx25/35/51/53 esdhc. Thus, the controller version on these SoCs is wrongly identified as v1 while it's actually v2. Add the address fixup back and take a different approach to correct imx6q host version, so that the host version read gets back to work for all SoCs. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:56 -05:00
Tony Prisk	893613b06f	mmc: vt8500: Remove erroneous __exitp in wmt_mci_driver With the __devinit/__devexit attributes having been removed, this __exitp attribute causes an unused function warning and should be removed as well. Signed-off-by: Tony Prisk <linux@prisktech.co.nz> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:56 -05:00
Doug Anderson	9640639b09	mmc: dw_mmc: Remove DW_MCI_QUIRK_NO_WRITE_PROTECT The original quirk was added in the change 'mmc: dw_mmc: add quirk to indicate missing write protect line'. The original quirk was added at a controller level even though each slot has its own write protect (so the quirk should be at the slot level). A recent change (mmc: dw_mmc: Add "disable-wp" device tree property) added a slot-level quirk and support for the quirk directly to dw_mmc. Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Will Newton <will.newton@imgtec.com> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:55 -05:00
Doug Anderson	55a6ceb2d5	mmc: dw_mmc: Handle wp-gpios from device tree On some SoCs (like exynos5250) you need to use an external GPIO for write protect. Add support for wp-gpios to the core dw_mmc driver since it could be useful across multiple SoCs. With this change I am able to make use of the write protect for the external SD slot on exynos5250-snow. Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:54 -05:00
Doug Anderson	07b240411f	mmc: dw_mmc: exynos: Remove code for wp-gpios The exynos code claimed the write protect with devm_gpio_request() but never did anything with it. That meant that anyone using a write protect GPIO would effectively be write protected all the time. The handling for wp-gpios belongs in the main dw_mmc driver and has been moved there. Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:54 -05:00
Doug Anderson	488755b5cc	ARM: dts: Add disable-wp for sd card slot on smdk5250 The next change will remove the code from the dw_mmc-exynos that added the DW_MCI_QUIRK_NO_WRITE_PROTECT. Keep existing functionality of having no write protect pin on smdk5250 by adding the disable-wp property. Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:53 -05:00
Doug Anderson	a70aaa64da	mmc: dw_mmc: Add "disable-wp" device tree property The "disable-wp" property is used to specify that a given SD card slot doesn't have a concept of write protect. This eliminates the need for special case code for SD slots that should never be write protected (like a micro SD slot or a dev board). The dw_mmc driver is special in needing to specify "disable-wp" because the lack of a "wp-gpios" property means to use the special purpose write protect line. On some other mmc devices the lack of "wp-gpios" means that write protect should be disabled. Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Will Newton <will.newton@imgtec.com> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:52 -05:00
Zhang, YiX X	c148e9ff4b	mmc: correct the EXCEPTION_EVENTS_STATUS value in comment The right value is 54 according to eMMC 4.5 specification. Signed-off-by: ZhangYi <yix.x.zhang@intel.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:52 -05:00
Fabio Estevam	fd63ac761a	mmc: mxs-mmc: Fix warning due to incorrect type Fixes the following warning when building with W=1 option: drivers/mmc/host/mxs-mmc.c: In function 'mxs_mmc_adtc': drivers/mmc/host/mxs-mmc.c:401:2: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] The warning happens because 'i' is used in 'for_each_sg(sgl, sg, sg_len, i)' and should be made unsigned. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Acked-by: Marek Vasut <marex@denx.de> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:51 -05:00
Fabio Estevam	e7be434acc	mmc: mxs-mmc: Add MODULE_ALIAS() Add an entry for MODULE_ALIAS(). Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Acked-by: Marek Vasut <marex@denx.de> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:50 -05:00
Thomas Petazzoni	6f1989bc98	mmc: mvsdio: add pinctrl integration On many Marvell SoCs, the pins used for the SDIO interface are part of the MPP pins, that are muxable pins. In order to get the muxing of those pins correct, this commit integrates the mvsdio driver with the pinctrl infrastructure by calling devm_pinctrl_get_select_default() during ->probe(). Note that we permit this function to fail because not all Marvell platforms have yet been fully converted to using the pinctrl infrastructure. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Stefan Peter <s.peter@mpl.ch> Tested-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Chris Ball <cjb@laptop.org>	2013-02-24 14:36:42 -05:00
Paul Gortmaker	62ed839dba	gianfar: fix compile fail for NET_POLL=y due to struct packing Commit `ee873fda3b` ("gianfar: Pack struct gfar_priv_grp into three cachelines") moved the irq number and names off into a separate struct and created accessors for them. However it was never tested with NET_POLL enabled, and so some conversions that were simply overlooked went undetected until now. Make the netpoll ones also use the gfar_irq() accessors. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Claudiu Manoil <claudiu.manoil@freescale.com> Cc: Jianhua Xie <jianhua.xie@freescale.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-24 12:04:01 -05:00
Linus Torvalds	9e2d59ad58	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "This is the first pile; another one will come a bit later and will contain SYSCALL_DEFINE-related patches. - a bunch of signal-related syscalls (both native and compat) unified. - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE (fixing several potential problems with missing argument validation, while we are at it) - a lot of now-pointless wrappers killed - a couple of architectures (cris and hexagon) forgot to save altstack settings into sigframe, even though they used the (uninitialized) values in sigreturn; fixed. - microblaze fixes for delivery of multiple signals arriving at once - saner set of helpers for signal delivery introduced, several architectures switched to using those." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits) x86: convert to ksignal sparc: convert to ksignal arm: switch to struct ksignal * passing alpha: pass k_sigaction and siginfo_t using ksignal pointer burying unused conditionals make do_sigaltstack() static arm64: switch to generic old sigaction() (compat-only) arm64: switch to generic compat rt_sigaction() arm64: switch compat to generic old sigsuspend arm64: switch to generic compat rt_sigqueueinfo() arm64: switch to generic compat rt_sigpending() arm64: switch to generic compat rt_sigprocmask() arm64: switch to generic sigaltstack sparc: switch to generic old sigsuspend sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE sparc: kill sign-extending wrappers for native syscalls kill sparc32_open() sparc: switch to use of generic old sigaction sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE mips: switch to generic sys_fork() and sys_clone() ...	2013-02-23 18:50:11 -08:00
Dave Airlie	28ee46184f	Merge branch 'drm/hdmi-for-3.9' of git://anongit.freedesktop.org/tegra/linux into drm-next Thierry writes: "Remove a duplicate implementation of the CEA VIC lookup and move the CEA and other mode tables to drm_edid.c to make it more difficult to create duplicates of the tables. Add some helpers to pack CEA-861/HDMI AVI, audio and SPD infoframes into binary buffers that can easily be written into hardware registers. A new helper function makes it easy construct an AVI infoframe from a DRM display mode. Convert the Tegra and Radeon drivers to use the new HDMI helpers." * 'drm/hdmi-for-3.9' of git://anongit.freedesktop.org/tegra/linux: drm/radeon: Use generic HDMI infoframe helpers drm/tegra: Use generic HDMI infoframe helpers drm: Add EDID helper documentation drm: Add HDMI infoframe helpers video: Add generic HDMI infoframe helpers drm: Add some missing forward declarations drm: Move mode tables to drm_edid.c drm: Remove duplicate drm_mode_cea_vic()	2013-02-24 12:39:42 +10:00
Dave Airlie	a497bfe9db	Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next Two regressions fixes from snowboarding land * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: Revert hdmi HDP pin checks drm/i915: Handle untiled planes when computing their offsets	2013-02-24 12:39:02 +10:00
Dave Airlie	a3b1097c03	Merge branch 'drm/tegra-for-3.9' of git://anongit.freedesktop.org/tegra/linux into drm-next Thierry writes: "Add support for 2 hardware overlays found on Tegra. These support YUV pixel formats and can be used as video overlays. .mode_set_base() is implemented and support for VBLANK and page-flipping is added. A few minor bug fixes are also included and a new debugfs file allows to inspect the framebuffers attached to the Tegra DRM device." * 'drm/tegra-for-3.9' of git://anongit.freedesktop.org/tegra/linux: drm/tegra: Add list of framebuffers to debugfs drm/tegra: Fix color expansion drm/tegra: Split DC_CMD_STATE_CONTROL register write drm/tegra: Implement page-flipping support drm/tegra: Implement VBLANK support drm/tegra: Implement .mode_set_base() drm/tegra: Add plane support drm/tegra: Remove bogus tegra_framebuffer structure drm: Add consistency check for page-flipping	2013-02-24 12:38:22 +10:00
Cong Wang	da8c87241c	vlan: adjust vlan_set_encap_proto() for its callers There are two places to call vlan_set_encap_proto(): vlan_untag() and __pop_vlan_tci(). vlan_untag() assumes skb->data points after mac addr, otherwise the following code vhdr = (struct vlan_hdr ) skb->data; vlan_tci = ntohs(vhdr->h_vlan_TCI); __vlan_hwaccel_put_tag(skb, vlan_tci); skb_pull_rcsum(skb, VLAN_HLEN); won't be correct. But __pop_vlan_tci() assumes points _before_ mac addr. In vlan_set_encap_proto(), it looks for some magic L2 value after mac addr: rawp = skb->data; if ((unsigned short *) rawp == 0xFFFF) ... Therefore __pop_vlan_tci() is obviously wrong. A quick fix is avoiding using skb->data in vlan_set_encap_proto(), use 'vhdr+1' is always correct in both cases. Cc: David S. Miller <davem@davemloft.net> Cc: Jesse Gross <jesse@nicira.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-23 21:00:06 -05:00
Linus Torvalds	5ce1a70e2f	Merge branch 'akpm' (more incoming from Andrew) Merge second patch-bomb from Andrew Morton: - A little DM fix - the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits) ksm: allocate roots when needed mm: cleanup "swapcache" in do_swap_page mm,ksm: swapoff might need to copy mm,ksm: FOLL_MIGRATION do migration_entry_wait ksm: shrink 32-bit rmap_item back to 32 bytes ksm: treat unstable nid like in stable tree ksm: add some comments tmpfs: fix mempolicy object leaks tmpfs: fix use-after-free of mempolicy object mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages mm: export mmu notifier invalidates mm: accelerate mm_populate() treatment of THP pages mm: use long type for page counts in mm_populate() and get_user_pages() mm: accurately document nr_free_*_pages functions with code comments HWPOISON: change order of error_states[]'s elements HWPOISON: fix misjudgement of page_action() for errors on mlocked pages memcg: stop warning on memcg_propagate_kmem net: change type of virtio_chan->p9_max_pages vmscan: change type of vm_total_pages to unsigned long fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used ...	2013-02-23 17:50:35 -08:00
Hugh Dickins	ef53d16cde	ksm: allocate roots when needed It is a pity to have MAX_NUMNODES+MAX_NUMNODES tree roots statically allocated, particularly when very few users will ever actually tune merge_across_nodes 0 to use more than 1+1 of those trees. Not a big deal (only 16kB wasted on each machine with CONFIG_MAXSMP), but a pity. Start off with 1+1 statically allocated, then if merge_across_nodes is ever tuned, allocate for nr_node_ids+nr_node_ids. Do not attempt to free up the extra if it's tuned back, that would be a waste of effort. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:24 -08:00
Hugh Dickins	56f31801cc	mm: cleanup "swapcache" in do_swap_page I dislike the way in which "swapcache" gets used in do_swap_page(): there is always a page from swapcache there (even if maybe uncached by the time we lock it), but tests are made according to "swapcache". Rework that with "page != swapcache", as has been done in unuse_pte(). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:24 -08:00
Hugh Dickins	9e16b7fb1d	mm,ksm: swapoff might need to copy Before establishing that KSM page migration was the cause of my WARN_ON_ONCE(page_mapped(page))s, I suspected that they came from the lack of a ksm_might_need_to_copy() in swapoff's unuse_pte() - which in many respects is equivalent to faulting in a page. In fact I've never caught that as the cause: but in theory it does at least need the KSM_RUN_UNMERGE check in ksm_might_need_to_copy(), to avoid bringing a KSM page back in when it's not supposed to be. I intended to copy how it's done in do_swap_page(), but have a strong aversion to how "swapcache" ends up being used there: rework it with "page != swapcache". Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Hugh Dickins	5117b3b835	mm,ksm: FOLL_MIGRATION do migration_entry_wait In "ksm: remove old stable nodes more thoroughly" I said that I'd never seen its WARN_ON_ONCE(page_mapped(page)). True at the time of writing, but it soon appeared once I tried fuller tests on the whole series. It turned out to be due to the KSM page migration itself: unmerge_and_ remove_all_rmap_items() failed to locate and replace all the KSM pages, because of that hiatus in page migration when old pte has been replaced by migration entry, but not yet by new pte. follow_page() finds no page at that instant, but a KSM page reappears shortly after, without a fault. Add FOLL_MIGRATION flag, so follow_page() can do migration_entry_wait() for KSM's break_cow(). I'd have preferred to avoid another flag, and do it every time, in case someone else makes the same easy mistake; but did not find another transgressor (the common get_user_pages() is of course safe), and cannot be sure that every follow_page() caller is prepared to sleep - ia64's xencomm_vtop()? Now, THP's wait_split_huge_page() can already sleep there, since anon_vma locking was changed to mutex, but maybe that's somehow excluded. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Hugh Dickins	bc56620b49	ksm: shrink 32-bit rmap_item back to 32 bytes Think of struct rmap_item as an extension of struct page (restricted to MADV_MERGEABLE areas): there may be a lot of them, we need to keep them small, especially on 32-bit architectures of limited lowmem. Siting "int nid" after "unsigned int checksum" works nicely on 64-bit, making no change to its 64-byte struct rmap_item; but bloats the 32-bit struct rmap_item from (nicely cache-aligned) 32 bytes to 36 bytes, which rounds up to 40 bytes once allocated from slab. We'd better avoid that. Hey, I only just remembered that the anon_vma pointer in struct rmap_item has no purpose until the rmap_item is hung from a stable tree node (which has its own nid field); and rmap_item's nid field no purpose than to say which tree root to tell rb_erase() when unlinking from an unstable tree. Double them up in a union. There's just one place where we set anon_vma early (when we already hold mmap_sem): now we must remove tree_rmap_item from its unstable tree there, before overwriting nid. No need to spatter BUG()s around: we'd be seeing oopses if this were wrong. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Hugh Dickins	b599cbdf1c	ksm: treat unstable nid like in stable tree An inconsistency emerged in reviewing the NUMA node changes to KSM: when meeting a page from the wrong NUMA node in a stable tree, we say that it's okay for comparisons, but not as a leaf for merging; whereas when meeting a page from the wrong NUMA node in an unstable tree, we bail out immediately. Now, it might be that a wrong NUMA node in an unstable tree is more likely to correlate with instablility (different content, with rbnode now misplaced) than page migration; but even so, we are accustomed to instablility in the unstable tree. Without strong evidence for which strategy is generally better, I'd rather be consistent with what's done in the stable tree: accept a page from the wrong NUMA node for comparison, but not as a leaf for merging. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Hugh Dickins	8fdb3dbf02	ksm: add some comments Added slightly more detail to the Documentation of merge_across_nodes, a few comments in areas indicated by review, and renamed get_ksm_page()'s argument from "locked" to "lock_it". No functional change. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Greg Thelen	49cd0a5c29	tmpfs: fix mempolicy object leaks Fix several mempolicy leaks in the tmpfs mount logic. These leaks are slow - on the order of one object leaked per mount attempt. Leak 1 (umount doesn't free mpol allocated in mount): while true; do mount -t tmpfs -o mpol=interleave,size=100M nodev /mnt umount /mnt done Leak 2 (errors parsing remount options will leak mpol): mount -t tmpfs -o size=100M nodev /mnt while true; do mount -o remount,mpol=interleave,size=x /mnt 2> /dev/null done umount /mnt Leak 3 (multiple mpol per mount leak mpol): while true; do mount -t tmpfs -o mpol=interleave,mpol=interleave,size=100M nodev /mnt umount /mnt done This patch fixes all of the above. I could have broken the patch into three pieces but is seemed easier to review as one. [akpm@linux-foundation.org: fix handling of mpol_parse_str() errors, per Hugh] Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Greg Thelen	5f00110f72	tmpfs: fix use-after-free of mempolicy object The tmpfs remount logic preserves filesystem mempolicy if the mpol=M option is not specified in the remount request. A new policy can be specified if mpol=M is given. Before this patch remounting an mpol bound tmpfs without specifying mpol= mount option in the remount request would set the filesystem's mempolicy object to a freed mempolicy object. To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run: # mkdir /tmp/x # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0 # mount -o remount,size=200M nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0 # note ? garbage in mpol=... output above # dd if=/dev/zero of=/tmp/x/f count=1 # panic here Panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) [...] Oops: 0010 [#1] SMP DEBUG_PAGEALLOC Call Trace: mpol_shared_policy_init+0xa5/0x160 shmem_get_inode+0x209/0x270 shmem_mknod+0x3e/0xf0 shmem_create+0x18/0x20 vfs_create+0xb5/0x130 do_last+0x9a1/0xea0 path_openat+0xb3/0x4d0 do_filp_open+0x42/0xa0 do_sys_open+0xfe/0x1e0 compat_sys_open+0x1b/0x20 cstar_dispatch+0x7/0x1f Non-debug kernels will not crash immediately because referencing the dangling mpol will not cause a fault. Instead the filesystem will reference a freed mempolicy object, which will cause unpredictable behavior. The problem boils down to a dropped mpol reference below if shmem_parse_options() does not allocate a new mpol: config = sbinfo shmem_parse_options(data, &config, true) mpol_put(sbinfo->mpol) sbinfo->mpol = config.mpol / BUG: saves unreferenced mpol */ This patch avoids the crash by not releasing the mempolicy if shmem_parse_options() doesn't create a new mpol. How far back does this issue go? I see it in both 2.6.36 and 3.3. I did not look back further. Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Mel Gorman	67d46b296a	mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages Rob van der Heij reported the following (paraphrased) on private mail. The scenario is that I want to avoid backups to fill up the page cache and purge stuff that is more likely to be used again (this is with s390x Linux on z/VM, so I don't give it as much memory that we don't care anymore). So I have something with LD_PRELOAD that intercepts the close() call (from tar, in this case) and issues a posix_fadvise() just before closing the file. This mostly works, except for small files (less than 14 pages) that remains in page cache after the face. Unfortunately Rob has not had a chance to test this exact patch but the test program below should be reproducing the problem he described. The issue is the per-cpu pagevecs for LRU additions. If the pages are added by one CPU but fadvise() is called on another then the pages remain resident as the invalidate_mapping_pages() only drains the local pagevecs via its call to pagevec_release(). The user-visible effect is that a program that uses fadvise() properly is not obeyed. A possible fix for this is to put the necessary smarts into invalidate_mapping_pages() to globally drain the LRU pagevecs if a pagevec page could not be discarded. The downside with this is that an inode cache shrink would send a global IPI and memory pressure potentially causing global IPI storms is very undesirable. Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to check if invalidate_mapping_pages() discarded all the requested pages. If a subset of pages are discarded it drains the LRU pagevecs and tries again. If the second attempt fails, it assumes it is due to the pages being mapped, locked or dirty and does not care. With this patch, an application using fadvise() correctly will be obeyed but there is a downside that a malicious application can force the kernel to send global IPIs and increase overhead. If accepted, I would like this to be considered as a -stable candidate. It's not an urgent issue but it's a system call that is not working as advertised which is weak. The following test program demonstrates the problem. It should never report that pages are still resident but will without this patch. It assumes that CPU 0 and 1 exist. int main() { int fd; int pagesize = getpagesize(); ssize_t written = 0, expected; char buf; unsigned char vec; int resident, i; cpu_set_t set; /* Prepare a buffer for writing / expected = FILESIZE_PAGES pagesize; buf = malloc(expected + 1); if (buf == NULL) { printf("ENOMEM\n"); exit(EXIT_FAILURE); } buf[expected] = 0; memset(buf, 'a', expected); /* Prepare the mincore vec / vec = malloc(FILESIZE_PAGES); if (vec == NULL) { printf("ENOMEM\n"); exit(EXIT_FAILURE); } / Bind ourselves to CPU 0 / CPU_ZERO(&set); CPU_SET(0, &set); if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) { perror("sched_setaffinity"); exit(EXIT_FAILURE); } / open file, unlink and write buffer / fd = open("fadvise-test-file", O_CREAT\|O_EXCL\|O_RDWR); if (fd == -1) { perror("open"); exit(EXIT_FAILURE); } unlink("fadvise-test-file"); while (written < expected) { ssize_t this_write; this_write = write(fd, buf + written, expected - written); if (this_write == -1) { perror("write"); exit(EXIT_FAILURE); } written += this_write; } free(buf); / * Force ourselves to another CPU. If fadvise only flushes the local * CPUs pagevecs then the fadvise will fail to discard all file pages / CPU_ZERO(&set); CPU_SET(1, &set); if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) { perror("sched_setaffinity"); exit(EXIT_FAILURE); } / sync and fadvise to discard the page cache / fsync(fd); if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) { perror("posix_fadvise"); exit(EXIT_FAILURE); } / map the file and use mincore to see which parts of it are resident / buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0); if (buf == NULL) { perror("mmap"); exit(EXIT_FAILURE); } if (mincore(buf, expected, vec) == -1) { perror("mincore"); exit(EXIT_FAILURE); } / Check residency */ for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) { if (vec[i]) resident++; } if (resident != 0) { printf("Nr unexpected pages resident: %d\n", resident); exit(EXIT_FAILURE); } munmap(buf, expected); close(fd); free(vec); exit(EXIT_SUCCESS); } Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Rob van der Heij <rvdheij@gmail.com> Tested-by: Rob van der Heij <rvdheij@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Cliff Wickman	fa794199e3	mm: export mmu notifier invalidates We at SGI have a need to address some very high physical address ranges with our GRU (global reference unit), sometimes across partitioned machine boundaries and sometimes with larger addresses than the cpu supports. We do this with the aid of our own 'extended vma' module which mimics the vma. When something (either unmap or exit) frees an 'extended vma' we use the mmu notifiers to clean them up. We had been able to mimic the functions __mmu_notifier_invalidate_range_start() and __mmu_notifier_invalidate_range_end() by locking the per-mm lock and walking the per-mm notifier list. But with the change to a global srcu lock (static in mmu_notifier.c) we can no longer do that. Our module has no access to that lock. So we request that these two functions be exported. Signed-off-by: Cliff Wickman <cpw@sgi.com> Acked-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Michel Lespinasse	240aadeedc	mm: accelerate mm_populate() treatment of THP pages This change adds a follow_page_mask function which is equivalent to follow_page, but with an extra page_mask argument. follow_page_mask sets *page_mask to HPAGE_PMD_NR - 1 when it encounters a THP page, and to 0 in other cases. __get_user_pages() makes use of this in order to accelerate populating THP ranges - that is, when both the pages and vmas arrays are NULL, we don't need to iterate HPAGE_PMD_NR times to cover a single THP page (and we also avoid taking mm->page_table_lock that many times). Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:23 -08:00
Michel Lespinasse	28a35716d3	mm: use long type for page counts in mm_populate() and get_user_pages() Use long type for page counts in mm_populate() so as to avoid integer overflow when running the following test code: int main(void) { void *p = mmap(NULL, 0x100000000000, PROT_READ, MAP_PRIVATE \| MAP_ANON, -1, 0); printf("p: %p\n", p); mlockall(MCL_CURRENT); printf("done\n"); return 0; } Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:22 -08:00
Zhang Yanfei	e0fb581529	mm: accurately document nr_free_*_pages functions with code comments nr_free_zone_pages(), nr_free_buffer_pages() and nr_free_pagecache_pages() are horribly badly named, so accurately document them with code comments in case of the misuse of them. [akpm@linux-foundation.org: tweak comments] Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:22 -08:00
Naoya Horiguchi	5f4b9fc5c1	HWPOISON: change order of error_states[]'s elements error_states[] has two separate states "unevictable LRU page" and "mlocked LRU page", and the former one has the higher priority now. But because of that the latter one is rarely chosen because pages with PageMlocked highly likely have PG_unevictable set. On the other hand, PG_unevictable without PageMlocked is common for ramfs or SHM_LOCKed shared memory, so reversing the priority of these two states helps us clearly distinguish them. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Chen Gong <gong.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:22 -08:00

... 2 3 4 5 6 ...

359008 Commits