Commit Graph

344063 Commits

Author SHA1 Message Date
Linus Torvalds
a11da7df65 ARM: arm-soc: power management and clock changes
This branch contains a largeish set of updates of power management and
 clock setup. The bulk of it is for OMAP/AM33xx platforms, but also a
 few around hotplug/suspend/resume on Exynos.
 
 It includes a split-up of some of the OMAP clock data into separate
 files which adds to the diffstat, but gross delta is fairly reasonable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQySrdAAoJEIwa5zzehBx3XBMQAJCgrE/I1vbuiENiVMLpQkN7
 0Y5t89SwoALnme5jsM8vW/H0o6b163FfH+249UNs6dyk6MdCqqSv8cQVvSfCKRKn
 m4JTXS6njSnYgU025klNU9W3FyMuFcgH8b63+c5+sCcqfvdkxrjpNoKqAYNxQlgJ
 6DHgHCKZP3JB1e2EFqViw0GJgZIjTNxlWvx8XLcAShSapQ+Ovq/iAcKo41o7ZadB
 4zV//VkBMv0TLNTs6SoR0EO8qM+2nIUKE8fgPrRxvvb98tuasKPvlSq9VUeIjnYk
 kWjNTQYbD1IkrAMPYrwcpLU3xkEr14wrKDBtPK6GdCDcXzdyKq3fzROOXNsqrLmn
 Y8PkY+J39B59F0DKjyrCjasZyi0tQQV6ps5Xm2X2CB003GWEbo1yu+BodShYEyaL
 7OvkLiVpOtq+LOYcsScHtOGdO9le2O3yZevNRvlrCDCvJUYbXNjaM9ZrWcQuTlJc
 oseYNSRPaIP5PMv2c+Xup9qh/dyAjj8g6gSi9BlDvwVOu/+Cf3laaCAXaFph22jT
 /m8fW2paiP5UJ0gSt1MLAGFxKi0YI/Qxck8G11LJxUwMZd3SIfOPUAY27tnNC4yL
 GRynOi3BFRMUAe/Leu2qgwEyoME5nHn+OmAKb36WTYH/HL+PGzHq84e+Wfdan8ha
 YcSKc1A52LSoYTi4XTUX
 =h2Qo
 -----END PGP SIGNATURE-----

Merge tag 'pm-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC power management and clock changes from Olof Johansson:
 "This branch contains a largeish set of updates of power management and
  clock setup.  The bulk of it is for OMAP/AM33xx platforms, but also a
  few around hotplug/suspend/resume on Exynos.

  It includes a split-up of some of the OMAP clock data into separate
  files which adds to the diffstat, but gross delta is fairly reasonable."

* tag 'pm-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (60 commits)
  ARM: OMAP: Move plat-omap/dma-omap.h to include/linux/omap-dma.h
  ASoC: OMAP: mcbsp fixes for enabling ARM multiplatform support
  watchdog: OMAP: fixup for ARM multiplatform support
  ARM: EXYNOS: Add flush_cache_all in suspend finisher
  ARM: EXYNOS: Remove scu_enable from cpuidle
  ARM: EXYNOS: Fix soft reboot hang after suspend/resume
  ARM: EXYNOS: Add support for rtc wakeup
  ARM: EXYNOS: fix the hotplug for Cortex-A15
  ARM: OMAP2+: omap_device: Correct resource handling for DT boot
  ARM: OMAP2+: hwmod: Add possibility to count hwmod resources based on type
  ARM: OMAP2+: hwmod: Add support for per hwmod/module context lost count
  ARM: OMAP2+: PRM: initialize some PRM functions early
  ARM: OMAP2+: voltage: fixup oscillator handling when CONFIG_PM=n
  ARM: OMAP4: USB: power down MUSB PHY during boot
  ARM: OMAP2+: clock: Cleanup !CONFIG_COMMON_CLK parts
  ARM: OMAP2xxx: clock: drop obsolete clock data
  ARM: OMAP2: clock: Cleanup !CONFIG_COMMON_CLK parts
  ARM: OMAP3+: DPLL: drop !CONFIG_COMMON_CLK sections
  ARM: AM33xx: clock: drop obsolete clock data
  ARM: OMAP3xxx: clk: drop obsolete clock data
  ...
2012-12-13 10:58:20 -08:00
Linus Torvalds
b8edf848e9 ARM: arm-soc: multiplatform conversion patches
Here are more patches in the progression towards multiplatform, sparse
 irq conversions in particular.
 
 Tegra has a handful of cleanups and general groundwork, but is
 not quite there yet on full enablement.
 
 Platforms that are enabled through this branch are VT8500 and Zynq. note
 that i.MX was converted in one of the earlier cleanup branches as
 well (before we started a separate topic for multiplatform). And both
 new platforms for this merge window, sunxi and bcm, were merged with
 multiplatform support enabled.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQySb/AAoJEIwa5zzehBx3Wo4P/0GrpUhB/qwuhgy43MA2I1Dv
 tnyuFvsfW9uRExcw2IwT39GFls98QUM9TwQxPqOTHVf+u0LkYMZ9aDeWJOdj3RvG
 H70Ypj4gZDrzZAFr2TUf8NnYGHd6G2EcMn3261Hjfd7YrswCjsMPvgRns7VOyHCa
 deif3KcLu3+HzxvuzqlVlTuSAagCQbfqqnTQduMRdJPHT3X3sXwl7ABW+qfOoeYC
 rjqIbjdh5dB1d/f7igtgBbXjSTnVz/Mr1+wk4rp9Xr1Wv0IXvIaSKjK2Df8ZuNAk
 aQ6mMy/oDVxlDSrYv0F7lB40/rsZcPqz8+fgYJ2FnvCpIM7z7NeTWD2kQJ2UaQ/s
 VunShloRxF8It6104EVWZDfEA9NvVBcCALSze0NukqiHZRZYGUzxRNQDrncaksC9
 Lm+Z16cUWogsZq7VDCgXYQJeakPQfBDnsx7siMvAbOgvtpSClxuwhdC/czJiix7h
 BcpA+l5xSviUhHvzHhDt9iJxHjbUmo1xLDvaZSgj2OjAj257JcwaNBCk5BjZTCwe
 xZmQu1FjwaGtjLiG6QY0WJRsq1hiFRIb/MaWar/WpfqADFqARoambGFUjOl+P4Mu
 DIM5Z0AS04H+pLuP1QOz/yXxOPEP6Ri36to6XrgzfL/XGet5LW2P59xXxhcWC/OL
 /3IAcQrsAqh4aGMOstW1
 =UJlh
 -----END PGP SIGNATURE-----

Merge tag 'multiplatform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC multiplatform conversion patches from Olof Johansson:
 "Here are more patches in the progression towards multiplatform, sparse
  irq conversions in particular.

  Tegra has a handful of cleanups and general groundwork, but is not
  quite there yet on full enablement.

  Platforms that are enabled through this branch are VT8500 and Zynq.
  Note that i.MX was converted in one of the earlier cleanup branches as
  well (before we started a separate topic for multiplatform).  And both
  new platforms for this merge window, sunxi and bcm, were merged with
  multiplatform support enabled."

Fix up conflicts mostly as per Olof.

* tag 'multiplatform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (29 commits)
  ARM: zynq: Remove all unused mach headers
  ARM: zynq: add support for ARCH_MULTIPLATFORM
  ARM: zynq: make use of debug_ll_io_init()
  ARM: zynq: remove TTC early mapping
  ARM: tegra: move debug-macro.S to include/debug
  ARM: tegra: don't include iomap.h from debug-macro.S
  ARM: tegra: decouple uncompress.h and debug-macro.S
  ARM: tegra: simplify DEBUG_LL UART selection options
  ARM: tegra: select SPARSE_IRQ
  ARM: tegra: enhance timer.c to get IO address from device tree
  ARM: tegra: enhance timer.c to get IRQ info from device tree
  ARM: timer: fix checkpatch warnings
  ARM: tegra: add TWD to device tree
  ARM: tegra: define DT bindings for and instantiate RTC
  ARM: tegra: define DT bindings for and instantiate timer
  clocksource/mtu-nomadik: use apb_pclk
  clk: ux500: Register mtu apb_pclocks
  ARM: plat-nomadik: convert platforms to SPARSE_IRQ
  mfd/db8500-prcmu: use the irq_domain_add_simple()
  mfd/ab8500-core: use irq_domain_add_simple()
  ...
2012-12-13 10:57:16 -08:00
Linus Torvalds
db5b0ae007 ARM: arm-soc: device tree conversions and enablement
Continued device tree conversion and enablement across a number of
 platforms; Kirkwood, tegra, i.MX, Exynos, zynq and a couple of other
 smaller series as well.
 
 ux500 has seen continued conversion for platforms. Several platforms have
 seen pinctrl-via-devicetree conversions for simpler multiplatform. Tegra
 is adding data for new devices/drivers, and Exynos has a bunch of new
 bindings and devices added as well.
 
 So, pretty much the same progression in the right direction as the last
 few releases.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQySW7AAoJEIwa5zzehBx39xcP/jzEQOTOJdK4zJd1OjgrQoX/
 WnhbGJT941RNjRjvDG6HmZzhpsRoE4q/zkjFEKoKELdikRW0hYoR+zPCGuB7XtN5
 aF1ZQrTx4gHf4KE7doIB8slaWeOq8aG2TLFhylyy+cuaIpRK0NG0pAR0ZqWaoga9
 tZFciqzplLeo50vZ+y+lVVsR40j/w29EjwPXhCV30//gGOYLyp/VDu5PRtrBdgh8
 EgpcT2EWJwMCN/Upcao/q2JbQktPHPpSwnpaUAALYB20uD7k5jo7wtYE/+L9nn6B
 bxcCDTMVmqzNTF+y0P16hDcs5jMLVjpI0xBiyZ1G6gShpggsSZCHY5ynjAtQ19se
 r+2WrNfOR23k6arJuOUAQSEnLdx0T5SlW6CJeFEofKv4uoebxAbKUiNO4ShWskhd
 nNptX1+L3hj3zpjGcEHmL6bd+nGtyMeoG9Yekcv1oZxdVcpKhFxh0s5PEJBEeXcN
 M7aAWlWJkplV22Olqhpc/3INCweq6E+zBrBxZaUBW/JCzGrqBUGC0BULDPAkmC4J
 CKL6IqIB73jGQ4OY14IaMU20GJrIGxZ7wzXOp4aw3OUpRlxsgurfyFQeIjUvVoZL
 PJ8DRoAVwreVHvKfgZZVKpSAY7dwcWbxpWsYlrH3zWIC5vRJ0UFwsD0TpLJWd6Vi
 XA8gQcJRWKGS8E5mRY39
 =Rk9v
 -----END PGP SIGNATURE-----

Merge tag 'dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC device tree conversions and enablement from Olof Johansson:
 "Continued device tree conversion and enablement across a number of
  platforms; Kirkwood, tegra, i.MX, Exynos, zynq and a couple of other
  smaller series as well.

  ux500 has seen continued conversion for platforms.  Several platforms
  have seen pinctrl-via-devicetree conversions for simpler
  multiplatform.  Tegra is adding data for new devices/drivers, and
  Exynos has a bunch of new bindings and devices added as well.

  So, pretty much the same progression in the right direction as the
  last few releases."

Fix up conflicts as per Olof.

* tag 'dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (185 commits)
  ARM: ux500: Rename dbx500 cpufreq code to be more generic
  ARM: dts: add missing ux500 device trees
  ARM: ux500: Stop registering the PCM driver from platform code
  ARM: ux500: Move board specific GPIO info out to subordinate DTS files
  ARM: ux500: Disable the MMCI gpio-regulator by default
  ARM: Kirkwood: remove kirkwood_ehci_init() from new boards
  ARM: Kirkwood: Add support LED of OpenBlocks A6
  ARM: Kirkwood: Convert to EHCI via DT for OpenBlocks A6
  ARM: kirkwood: Add NAND partiton map for OpenBlocks A6
  ARM: kirkwood: Add support second I2C bus and RTC on OpenBlocks A6
  ARM: kirkwood: Add support DT of second I2C bus
  ARM: kirkwood: Convert mplcec4 board to pinctrl
  ARM: Kirkwood: Convert km_kirkwood to pinctrl
  ARM: Kirkwood: support 98DX412x kirkwoods with pinctrl
  ARM: Kirkwood: Convert IX2-200 to pinctrl.
  ARM: Kirkwood: Convert lsxl boards to pinctrl.
  ARM: Kirkwood: Convert ib62x0 to pinctrl.
  ARM: Kirkwood: Convert GoFlex Net to pinctrl.
  ARM: Kirkwood: Convert dreamplug to pinctrl.
  ARM: Kirkwood: Convert dockstar to pinctrl.
  ...
2012-12-13 10:39:26 -08:00
stephen hemminger
eca2a43bb0 bridge: fix icmpv6 endian bug and other sparse warnings
Fix the warnings reported by sparse on recent bridge multicast
changes. Mostly just rcu annotation issues but in this case
sparse found a real bug! The ICMPv6 mld2 query mrc
values is in network byte order.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:11 -05:00
Yan Burman
dc2e57340d net: ethool: Document struct ethtool_flow_ext
Add documentation for struct ethtool_flow_ext especially in regard
to what flags are needed for which fields.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:11 -05:00
YOSHIFUJI Hideaki / 吉藤英明
7bdc1b4aba ndisc: Fix padding error in link-layer address option.
If a natural number n exists where 2 + data_len <= 8n < 2 + data_len + pad,
post padding is not initialized correctly.

(Un)fortunately, the only type that requires pad is Infiniband,
whose pad is 2 and data_len is 20, and this logical error has not
become obvious, but it is better to fix.

Note that ndisc_opt_addr_space() handles the situation described
above correctly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:11 -05:00
Eric Dumazet
499744209b tuntap: dont use skb after netif_rx_ni(skb)
On Wed, 2012-12-12 at 23:16 -0500, Dave Jones wrote:
> Since todays net merge, I see this when I start openvpn..
>
> general protection fault: 0000 [#1] PREEMPT SMP
> Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs iTCO_wdt iTCO_vendor_support snd_emu10k1 snd_util_mem snd_ac97_codec coretemp ac97_bus microcode snd_hwdep snd_seq pcspkr snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd_rawmidi mfd_core snd_seq_device snd e1000e soundcore emu10k1_gp gameport i82975x_edac edac_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci sata_sil firewire_core crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core floppy
> CPU 0
> Pid: 1381, comm: openvpn Not tainted 3.7.0+ #14                  /D975XBX
> RIP: 0010:[<ffffffff815b54a4>]  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
> RSP: 0018:ffff88007d0d9c48  EFLAGS: 00010206
> RAX: 000000000000055d RBX: 6b6b6b6b6b6b6b4b RCX: 1471030a0180040a
> RDX: 0000000000000005 RSI: 00000000ffffffe0 RDI: ffff8800ba83fa80
> RBP: ffff88007d0d9cb8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000101 R12: ffff8800ba83fa80
> R13: 0000000000000008 R14: ffff88007d0d9cc8 R15: ffff8800ba83fa80
> FS:  00007f6637104800(0000) GS:ffff8800bf600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f563f5b01c4 CR3: 000000007d140000 CR4: 00000000000007f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process openvpn (pid: 1381, threadinfo ffff88007d0d8000, task ffff8800a540cd60)
> Stack:
>  ffff8800ba83fa80 0000000000000296 0000000000000000 0000000000000000
>  ffff88007d0d9cc8 ffffffff815bcff4 ffff88007d0d9ce8 ffffffff815b1831
>  ffff88007d0d9ca8 00000000703f6364 ffff8800ba83fa80 0000000000000000
> Call Trace:
>  [<ffffffff815bcff4>] ? netif_rx+0x114/0x4c0
>  [<ffffffff815b1831>] ? skb_copy_datagram_from_iovec+0x61/0x290
>  [<ffffffff815b672a>] __skb_get_rxhash+0x1a/0xd0
>  [<ffffffffa03b9538>] tun_get_user+0x418/0x810 [tun]
>  [<ffffffff8135f468>] ? delay_tsc+0x98/0xf0
>  [<ffffffff8109605c>] ? __rcu_read_unlock+0x5c/0xa0
>  [<ffffffffa03b9a41>] tun_chr_aio_write+0x81/0xb0 [tun]
>  [<ffffffff81145011>] ? __buffer_unlock_commit+0x41/0x50
>  [<ffffffff811db917>] do_sync_write+0xa7/0xe0
>  [<ffffffff811dc01f>] vfs_write+0xaf/0x190
>  [<ffffffff811dc375>] sys_write+0x55/0xa0
>  [<ffffffff81705540>] tracesys+0xdd/0xe2
> Code: 41 8b 44 24 68 41 2b 44 24 6c 01 de 29 f0 83 f8 03 0f 8e a0 00 00 00 48 63 de 49 03 9c 24 e0 00 00 00 48 85 db 0f 84 72 fe ff ff <8b> 03 41 89 46 08 b8 01 00 00 00 e9 43 fd ff ff 0f 1f 40 00 48
> RIP  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
>  RSP <ffff88007d0d9c48>
> ---[ end trace 6d42c834c72c002e ]---
>
>
> Faulting instruction is
>
>    0:	8b 03                	mov    (%rbx),%eax
>
> rbx is slab poison (-20) so this looks like a use-after-free here...
>
>                         flow->ports = *ports;
>  314:   8b 03                   mov    (%rbx),%eax
>  316:   41 89 46 08             mov    %eax,0x8(%r14)
>
> in the inlined skb_header_pointer in skb_flow_dissect
>
> 	Dave
>

commit 96442e4242 (tuntap: choose the txq based on rxq) added
a use after free.

Cache rxhash in a temp variable before calling netif_rx_ni()

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:11 -05:00
Dave Jones
026e43def7 nfc: remove noisy message from llcp_sock_sendmsg
This is easily triggerable when fuzz-testing as an unprivileged user.
We could rate-limit it, but given we don't print similar messages
for other protocols, I just removed it.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:10 -05:00
Linus Torvalds
6be35c700f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
   using netlink.  From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
   Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
   tunneled traffic can still be checksummed by HW).  From Joseph
   Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
   Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
   from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
   socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
   Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
    realities.  The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace.  From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
    Baldessari.

And many, many, driver updates, cleanups, and improvements.  Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  net/mlx4_en: Add support for destination MAC in steering rules
  net/mlx4_en: Use generic etherdevice.h functions.
  net: ethtool: Add destination MAC address to flow steering API
  bridge: add support of adding and deleting mdb entries
  bridge: notify mdb changes via netlink
  ndisc: Unexport ndisc_{build,send}_skb().
  uapi: add missing netconf.h to export list
  pkt_sched: avoid requeues if possible
  solos-pci: fix double-free of TX skb in DMA mode
  bnx2: Fix accidental reversions.
  bna: Driver Version Updated to 3.1.2.1
  bna: Firmware update
  bna: Add RX State
  bna: Rx Page Based Allocation
  bna: TX Intr Coalescing Fix
  bna: Tx and Rx Optimizations
  bna: Code Cleanup and Enhancements
  ath9k: check pdata variable before dereferencing it
  ath5k: RX timestamp is reported at end of frame
  ath9k_htc: RX timestamp is reported at end of frame
  ...
2012-12-12 18:07:07 -08:00
Linus Torvalds
e37aa63e87 MN10300 changes 2012-12-12
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUMi2WhOxKuMESys7AQK2bxAApJQL2x6/k4swH933rhdVooA2TiMVST3l
 XSy6yil6Qeqz82RDnVMfxQ069N8iP5x93fE918V6UzeIrUmKEL8xD2UJCZzjW6B9
 vmBNrD6VUGdiBhTcGY7er4EtlnRf1XJgUPmfdIEAJoZ8VMkKyYAGkckW2I8hiYbZ
 gyF+ONc+CHxspqS1CzNUmmbP84T6rij2fydqLaSNNnQYnEfICt7dciv73KBQYMtn
 AsCLcmWW4DkZ37VL6Bg8yvgRaxbNlZpS0Rl5oKS65rYX9azt/SvujSta0UEv+uYF
 m/2HqExwgo8HZHKyIEpRgBLqfOfekJATbSLEq3jEgA73MLdzw2DTgpJQOmWCjtjN
 7bROv2O57e8ttxb81x10YyInzOTYOd18XEb2Qa6O4wbB5TS8MxZywfuTfL+sdfsN
 pquqyKNgxD7HqqxIcWSNKGxkPPZ/Xk/JmgcQFVCjpvvdCizsFTwWeiAd81Jz0Dn+
 SLL345nlDJPVukgIiDiwm9UvkyG0Pg03K5k6+7QOWB/5AdPqgRUeOi6gqQE7ZQ9G
 GK8/2xX4xFJ8LLPqfh2X+1PUesa8Dhph4NorsW4comJtPcLuh30XbwIKTjpBF90y
 7OILeZeQ+qFu8S9lLSQOr6zxs3/9uKP8ADoOAnFmUEE2PkzvZqMrnrlezAyqtNht
 LkVa/IR/z50=
 =ex0l
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20121212' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300

Pull MN10300 changes from David Howells:
 "miscellaneous MN10300 arch patches.  I've based it on top of Al Viro's
  signal tree - so these patches should be pulled after that."

* tag 'for-linus-20121212' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300:
  MN10300: Use asm-generic/pci_iomap.h
  MN10300: Get rid of unused variable from ASB2305 PCI code
  MN10300: ASB2305 PCI code needs linux/irq.h
  mn10300/mm/fault.c: Port OOM changes to do_page_fault
  MN10300: Handle cacheable PCI regions in pci_iomap()
  MN10300: fix debug polling in ttySM driver
  MN10300: ttySM: clean up unnecessary casting
  MN10300: fix SMP synchronization between txdma and serial driver
  MN10300: fix serial port vdma irq setup for SMP
  MN10300: cleanup IRQ affinity setting
  MN10300: ttySM: Use memory barriers correctly in circular buffer logic
2012-12-12 17:50:34 -08:00
Lin Feng
98870901cc mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
reserve_bootmem_generic() has no caller,

Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Dominik Dingel
66521d5aa6 mm/memory.c: remove unused code from do_wp_page()
page_mkwrite is initalized with zero and only set once, from that point
exists no way to get to the oom or oom_free_new labels.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Kirill A. Shutemov
816422ad76 asm-generic, mm: pgtable: consolidate zero page helpers
We have two different implementation of is_zero_pfn() and my_zero_pfn()
helpers: for architectures with and without zero page coloring.

Let's consolidate them in <asm-generic/pgtable.h>.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Naoya Horiguchi
56f2fb1476 mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
Fix the warning from __list_del_entry() which is triggered when a process
tries to do free_huge_page() for a hwpoisoned hugepage.

free_huge_page() can be called for hwpoisoned hugepage from
unpoison_memory().  This function gets refcount once and clears
PageHWPoison, and then puts refcount twice to return the hugepage back to
free pool.  The second put_page() finally reaches free_huge_page().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Naoya Horiguchi
5f24ae585b hwpoison, hugetlbfs: fix RSS-counter warning
Memory error handling on hugepages can break a RSS counter, which emits a
message like "Bad rss-counter state mm:ffff88040abecac0 idx:1 val:-1".
This is because PageAnon returns true for hugepage (this behavior is
necessary for reverse mapping to work on hugetlbfs).

[akpm@linux-foundation.org: clean up code layout]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Naoya Horiguchi
8c4894c6bc hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
When a process which used a hwpoisoned hugepage tries to exit() or
munmap(), the kernel can print out "bad pmd" message because page table
walker in free_pgtables() encounters 'hwpoisoned entry' on pmd.

This is because currently we fail to clear the hwpoisoned entry in
__unmap_hugepage_range(), so this patch simply does it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Michel Lespinasse
4128997b5f mm: protect against concurrent vma expansion
expand_stack() runs with a shared mmap_sem lock.  Because of this, there
could be multiple concurrent stack expansions in the same mm, which may
cause problems in the vma gap update code.

I propose to solve this by taking the mm->page_table_lock around such vma
expansions, in order to avoid the concurrency issue.  We only have to
worry about concurrent expand_stack() calls here, since we hold a shared
mmap_sem lock and all vma modificaitons other than expand_stack() are done
under an exclusive mmap_sem lock.

I previously tried to achieve the same effect by making sure all growable
vmas in a given mm would share the same anon_vma, which we already lock
here.  However this turned out to be difficult - all of the schemes I
tried for refcounting the growable anon_vma and clearing turned out ugly.
So, I'm now proposing only the minimal fix.

The overhead of taking the page table lock during stack expansion is
expected to be small: glibc doesn't use expandable stacks for the threads
it creates, so having multiple growable stacks is actually uncommon and we
don't expect the page table lock to get bounced between threads.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Michal Hocko
c95d26c2ff memcg: do not check for mm in __mem_cgroup_count_vm_event
The mm given to __mem_cgroup_count_vm_event() cannot be NULL because the
function is either called from the page fault path or vma->vm_mm is used.
So the check can be dropped.

The check was introduced by commit 456f998ec8 ("memcg: add the
pagefault count into memcg stats") because the originally proposed patch
used current->mm for shmem but this has been changed to vma->vm_mm later
on without the check being removed (thanks to Hugh for this
recollection).

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Hugh Dickins
220f2ac913 tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
Revert 3.5's commit f21f806220 ("tmpfs: revert SEEK_DATA and
SEEK_HOLE") to reinstate 4fb5ef089b ("tmpfs: support SEEK_DATA and
SEEK_HOLE"), with the intervening additional arg to
generic_file_llseek_size().

In 3.8, ext4 is expected to join btrfs, ocfs2 and xfs with proper
SEEK_DATA and SEEK_HOLE support; and a good case has now been made for
it on tmpfs, so let's join the party.

It's quite easy for tmpfs to scan the radix_tree to support llseek's new
SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are
still on my mind (in particular, the !PageUptodate-ness of pages
fallocated but still unwritten).

[akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jaegeuk Hanse <jaegeuk.hanse@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <wenqing.lz@taobao.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Jiang Liu
01cefaef40 mm: provide more accurate estimation of pages occupied by memmap
If SPARSEMEM is enabled, it won't build page structures for non-existing
pages (holes) within a zone, so provide a more accurate estimation of
pages occupied by memmap if there are bigger holes within the zone.

And pages for highmem zones' memmap will be allocated from lowmem, so
charge nr_kernel_pages for that.

[akpm@linux-foundation.org: mark calc_memmap_size __paging_init]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Chris Clayton <chris2553@googlemail.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Tested-by: Jianguo Wu <wujianguo@huawei.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Yan Hong
02c0ab684f fs/buffer.c: remove redundant initialization in alloc_page_buffers()
buffer_head comes from kmem_cache_zalloc(), no need to zero its fields.

Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Yan Hong
a3f3c29cb2 fs/buffer.c: do not inline exported function
It makes no sense to inline an exported function.

Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Yan Hong
5aaea51dfb writeback: fix a typo in comment
Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Jiang Liu
9feedc9d83 mm: introduce new field "managed_pages" to struct zone
Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.

	spanned_pages - absent_pages - memmap_pages - dma_reserve.

During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused.  The field zone->present_pages may
have different meanings in different contexts:

1) pages existing in a zone.
2) pages managed by the buddy system.

For more discussions about the issue, please refer to:
  http://lkml.org/lkml/2012/11/5/866
  https://patchwork.kernel.org/patch/1346751/

This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system".  And revert
zone->present_pages to count "physical pages existing in a zone", which
also keep in consistence with pgdat->node_present_pages.

We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.

For DMA/normal zones, the initial value is set to:

	(spanned_pages - absent_pages - memmap_pages - dma_reserve)

Later zone->managed_pages will be adjusted to the accurate value when the
bootmem allocator frees all free pages to the buddy system in function
free_all_bootmem_node() and free_all_bootmem().

The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.

This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.

[akpm@linux-foundation.org: small comment tweaks]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
David Rientjes
c2d23f919b mm, oom: remove statically defined arch functions of same name
out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily.  Inline the functions into the
pagefault handlers to clean the code up.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
David Rientjes
0fa84a4bfa mm, oom: remove redundant sleep in pagefault oom handler
out_of_memory() will already cause current to schedule if it has not been
killed, so doing it again in pagefault_out_of_memory() is redundant.
Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
David Rientjes
efacd02e4f mm, oom: cleanup pagefault oom handler
To lock the entire system from parallel oom killing, it's possible to pass
in a zonelist with all zones rather than using for_each_populated_zone()
for the iteration.  This obsoletes try_set_system_oom() and
clear_system_oom() so that they can be removed.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Lai Jiangshan
09285af75d memory_hotplug: allow online/offline memory to result movable node
Now, memory management can handle movable node or nodes which don't have
any normal memory, so we can dynamic configure and add movable node by:

	online a ZONE_MOVABLE memory from a previous offline node
	offline the last normal memory which result a non-normal-memory-node

movable-node is very important for power-saving, hardware partitioning and
high-available-system(hardware fault management).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Lai Jiangshan
20b2f52b73 numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
We need a node which only contains movable memory.  This feature is very
important for node hotplug.  If a node has normal/highmem, the memory may
be used by the kernel and can't be offlined.  If the node only contains
movable memory, we can offline the memory and the node.

All are prepared, we can actually introduce N_MEMORY.
add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node

[akpm@linux-foundation.org: fix Kconfig text]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
David Rientjes
68ae564bba mm, memcg: avoid unnecessary function call when memcg is disabled
While profiling numa/core v16 with cgroup_disable=memory on the command
line, I noticed mem_cgroup_count_vm_event() still showed up as high as
0.60% in perftop.

This occurs because the function is called extremely often even when memcg
is disabled.

To fix this, inline the check for mem_cgroup_disabled() so we avoid the
unnecessary function call if memcg is disabled.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Andrew Morton
05b0afd73d mm: add a reminder comment for __GFP_BITS_SHIFT
Cc: Glauber Costa <glommer@parallels.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Joonsoo Kim
2897b4d29d mm: WARN_ON_ONCE if f_op->mmap() change vma's start address
During reviewing the source code, I found a comment which mention that
after f_op->mmap(), vma's start address can be changed.  I didn't verify
that it is really possible, because there are so many f_op->mmap()
implementation.  But if there are some mmap() which change vma's start
address, it is possible error situation, because we already prepare prev
vma, rb_link and rb_parent and these are related to original address.

So add WARN_ON_ONCE for finding that this situtation really happens.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Greg Thelen
44e33e8f95 res_counter: delete res_counter_write()
Since commit 628f423553 ("memcg: limit change shrink usage") both
res_counter_write() and write_strategy_fn have been unused.  This patch
deletes them both.

Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
6715ddf945 hotplug: update nodemasks management
Update nodemasks management for N_MEMORY.

[lliubbo@gmail.com: fix build]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
4b0ef1fe8a page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Since we introduced N_MEMORY, we update the initialization of node_states.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
48fb2e240c vmscan: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
3c466d46a9 init: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
aee4faa499 kthread: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
a47b53c5f9 vmstat: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
8cebfcd074 hugetlb: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
01f13bd607 mempolicy: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
389162c22d mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
bd3a66c1cd oom: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:32 -08:00
Lai Jiangshan
31aaea4aa1 memcontrol: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:32 -08:00
Lai Jiangshan
4ff1b2c293 procfs: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:32 -08:00
Lai Jiangshan
38d7bee9d2 cpuset: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:32 -08:00
Lai Jiangshan
8219fc48ad mm: node_states: introduce N_MEMORY
We have N_NORMAL_MEMORY for standing for the nodes that have normal memory
with zone_type <= ZONE_NORMAL.

And we have N_HIGH_MEMORY for standing for the nodes that have normal or
high memory.

But we don't have any word to stand for the nodes that have *any* memory.

And we have N_CPU but without N_MEMORY.

Current code reuse the N_HIGH_MEMORY for this purpose because any node
which has memory must have high memory or normal memory currently.

A)	But this reusing is bad for *readability*. Because the name
	N_HIGH_MEMORY just stands for high or normal:

A.example 1)
	mem_cgroup_nr_lru_pages():
		for_each_node_state(nid, N_HIGH_MEMORY)

	The user will be confused(why this function just counts for high or
	normal memory node? does it counts for ZONE_MOVABLE's lru pages?)
	until someone else tell them N_HIGH_MEMORY is reused to stand for
	nodes that have any memory.

A.cont) If we introduce N_MEMORY, we can reduce this confusing
	AND make the code more clearly:

A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:

	One is in page_cgroup_init(void):
		for_each_node_state(nid, N_HIGH_MEMORY) {

	It means if the node have memory, we will allocate page_cgroup map for
	the node. We should use N_MEMORY instead here to gaim more clearly.

	The second using is in alloc_page_cgroup():
		if (node_state(nid, N_HIGH_MEMORY))
			addr = vzalloc_node(size, nid);

	It means if the node has high or normal memory that can be allocated
	from kernel. We should keep N_HIGH_MEMORY here, and it will be better
	if the "any memory" semantic of N_HIGH_MEMORY is removed.

B)	This reusing is out-dated if we introduce MOVABLE-dedicated node.
	The MOVABLE-dedicated node should not appear in
	node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY],
	because MOVABLE-dedicated node has no high or normal memory.

	In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node
	is in node_stats[N_HIGH_MEMORY], it is also means it is in
	node_stats[N_NORMAL_MEMORY], it causes SLUB wrong.

	The slub uses
		for_each_node_state(nid, N_NORMAL_MEMORY)
	and creates kmem_cache_node for MOVABLE-dedicated node and cause problem.

In one word, we need a N_MEMORY.  We just intrude it as an alias to
N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late
patches.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:32 -08:00
Marek Szyprowski
be49a6e135 mm: use migrate_prep() instead of migrate_prep_local()
__alloc_contig_migrate_range() should use all possible ways to get all the
pages migrated from the given memory range, so pruning per-cpu lru lists
for all CPUs is required, regadless the cost of such operation.  Otherwise
some pages which got stuck at per-cpu lru list might get missed by
migration procedure causing the contiguous allocation to fail.

Reported-by: SeongHwan Yoon <sunghwan.yun@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:32 -08:00
Thierry Reding
c8bf2d8ba4 mm: compaction: Fix compiler warning
compact_capture_page() is only used if compaction is enabled so it should
be moved into the corresponding #ifdef.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:32 -08:00
Kirill A. Shutemov
3ea41e6210 thp: avoid race on multiple parallel page faults to the same page
pmd value is stable only with mm->page_table_lock taken. After taking
the lock we need to check that nobody modified the pmd before changing it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:32 -08:00