Pablo Neira Ayuso says:
====================
netfilter updates: nf_tables pull request
The following patchset contains the current original nf_tables tree
condensed in 17 patches. I have organized them by chronogical order
since the original nf_tables code was released in 2009 and by
dependencies between the different patches.
The patches are:
1) Adapt all existing hooks in the tree to pass hook ops to the
hook callback function, required by nf_tables, from Patrick McHardy.
2) Move alloc_null_binding to nf_nat_core, as it is now also needed by
nf_tables and ip_tables, original patch from Patrick McHardy but
required major changes to adapt it to the current tree that I made.
3) Add nf_tables core, including the netlink API, the packet filtering
engine, expressions and built-in tables, from Patrick McHardy. This
patch includes accumulated fixes since 2009 and minor enhancements.
The patch description contains a list of references to the original
patches for the record. For those that are not familiar to the
original work, see [1], [2] and [3].
4) Add netlink set API, this replaces the original set infrastructure
to introduce a netlink API to add/delete sets and to add/delete
set elements. This includes two set types: the hash and the rb-tree
sets (used for interval based matching). The main difference with
ipset is that this infrastructure is data type agnostic. Patch from
Patrick McHardy.
5) Allow expression operation overload, this API change allows us to
provide define expression subtypes depending on the configuration
that is received from user-space via Netlink. It is used by follow
up patches to provide optimized versions of the payload and cmp
expressions and the x_tables compatibility layer, from Patrick
McHardy.
6) Add optimized data comparison operation, it requires the previous
patch, from Patrick McHardy.
7) Add optimized payload implementation, it requires patch 5, from
Patrick McHardy.
8) Convert built-in tables to chain types. Each chain type have special
semantics (filter, route and nat) that are used by userspace to
configure the chain behaviour. The main chain regarding iptables
is that tables become containers of chain, with no specific semantics.
However, you may still configure your tables and chains to retain
iptables like semantics, patch from me.
9) Add compatibility layer for x_tables. This patch adds support to
use all existing x_tables extensions from nf_tables, this is used
to provide a userspace utility that accepts iptables syntax but
used internally the nf_tables kernel core. This patch includes
missing features in the nf_tables core such as the per-chain
stats, default chain policy and number of chain references, which
are required by the iptables compatibility userspace tool. Patch
from me.
10) Fix transport protocol matching, this fix is a side effect of the
x_tables compatibility layer, which now provides a pointer to the
transport header, from me.
11) Add support for dormant tables, this feature allows you to disable
all chains and rules that are contained in one table, from me.
12) Add IPv6 NAT support. At the time nf_tables was made, there was no
NAT IPv6 support yet, from Tomasz Bursztyka.
13) Complete net namespace support. This patch register the protocol
family per net namespace, so tables (thus, other objects contained
in tables such as sets, chains and rules) are only visible from the
corresponding net namespace, from me.
14) Add the insert operation to the nf_tables netlink API, this requires
adding a new position attribute that allow us to locate where in the
ruleset a rule needs to be inserted, from Eric Leblond.
15) Add rule batching support, including atomic rule-set updates by
using rule-set generations. This patch includes a change to nfnetlink
to include two new control messages to indicate the beginning and
the end of a batch. The end message is interpreted as the commit
message, if it's missing, then the rule-set updates contained in the
batch are aborted, from me.
16) Add trace support to the nf_tables packet filtering core, from me.
17) Add ARP filtering support, original patch from Patrick McHardy, but
adapted to fit into the chain type infrastructure. This was recovered
to be used by nft userspace tool and our compatibility arptables
userspace tool.
There is still work to do to fully replace x_tables [4] [5] but that can
be done incrementally by extending our netlink API. Moreover, looking at
netfilter-devel and the amount of contributions to nf_tables we've been
getting, I think it would be good to have it mainstream to avoid accumulating
large patchsets skip continuous rebases.
I tried to provide a reasonable patchset, we have more than 100 accumulated
patches in the original nf_tables tree, so I collapsed many of the small
fixes to the main patch we had since 2009 and provide a small batch for
review to netdev, while trying to retain part of the history.
For those who didn't give a try to nf_tables yet, there's a quick howto
available from Eric Leblond that describes how to get things working [6].
Comments/reviews welcome.
Thanks!
[1] http://lwn.net/Articles/324251/
[2] http://workshop.netfilter.org/2013/wiki/images/e/ee/Nftables-osd-2013-developer.pdf
[3] http://lwn.net/Articles/564095/
[4] http://people.netfilter.org/pablo/map-pending-work.txt
[4] http://people.netfilter.org/pablo/nftables-todo.txt
[5] https://home.regit.org/netfilter-en/nftables-quick-howto/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes a comment mentioning IRQF_DISABLED,
which is deprecated.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch proposes to remove the use of the IRQF_DISABLED flag
It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai says:
====================
net/mlx4: Mellanox driver update 15-10-2013
This patchset contains small code cleaning patches, and a patch to make
mlx4_core use module_request() in order to load the relevant link layer module
(mlx4_en or mlx4_ib) according to the port type.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mellanox ConnectX architecture is: mlx4_core is the lower level
PCI driver which register on the PCI id, and protocol specific drivers
are depended on it: mlx4_en - for Ethernet and mlx4_ib for Infiniband.
NIC could have multiple ports which can change their type dynamically.
We use the request_module() call to load the relevant protocol driver
when needed: on loading time or at port type change event.
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up warning added by commit fe6f700d "net/mlx4_core: Respond to
operation request by firmware".
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Small code cleanup:
1. change MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN to MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN
2. put MLX4_SET_PORT_PRIO2TC and MLX4_SET_PORT_SCHEDULER in the same union with the
other MLX4_SET_PORT_yyy
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove code that triggers trivial build warnings.
drivers/net/ethernet/mellanox/mlx4/cmd.c: In function ‘mlx4_set_vf_vlan’:
drivers/net/ethernet/mellanox/mlx4/cmd.c:2256: warning: variable ‘vf_oper’ set but not used
drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_mode’:
drivers/net/ethernet/mellanox/mlx4/mcg.c:648: warning: comparison of unsigned expression < 0 is always false
drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_id’:
drivers/net/ethernet/mellanox/mlx4/mcg.c:685: warning: comparison of unsigned expression < 0 is always false
drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_hw_rule_sz’:
drivers/net/ethernet/mellanox/mlx4/mcg.c:712: warning: comparison of unsigned expression < 0 is always false
drivers/net/ethernet/mellanox/mlx4/fw.c: In function ‘mlx4_opreq_action’:
drivers/net/ethernet/mellanox/mlx4/fw.c:1732: warning: variable ‘type_m’ set but not used
drivers/net/ethernet/mellanox/mlx4/srq.c:302: warning: no previous prototype for ‘mlx4_srq_lookup’
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP listener refactoring, part 6 :
Use sock_gen_put() from inet_diag_dump_one_icsk() for future
SYN_RECV support.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- ensure RecordRoute information is added to BAT_ICMP echo_request/reply only
- use VLAN_ETH_HLEN when possible
- use htons when possible
- substitute old fragmentation code with a new improved implementation by
Martin Hundebøll
- create common header for BAT_ICMP packets to improve extendibility
- consider the network coding overhead when computing the overall room needed by
batman headers
- add dummy soft-interface rx mode handler
- minor code refactoring and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJSWWk3AAoJEADl0hg6qKeOA7sQALZHZ6T2w1fw3VF5kgBsbtaY
o4GC/9VWuXhJLNyqoDZ11fwohgjXDYQeXDXHBKugSbNwsjRF6nyVtfvFnb8SMCOH
w2u8UPUkafS+AbZswj+VhnMZDpa/QRYeHV+LhgNIEQFIShZpZrAx8DNqBp8A0h17
ifDo6wJQVfe5vZcgsxMaenxa4m/i7GUKDs34kzOFxGFWHkC0t9sPhqSz5gMhQc3j
4jO4B3UVcuqlQBmyeR5cSe4qWW3cQEXbvv52vgsMna+H3zDznK49j4YUCAFcNenN
Tkz6yb8JrrbJs8PCQ9ezOF1hDQJ88SmTNB3zAYiHjbxxgXdn2TlOGdw79jSVPRVL
aDAEQfspKi79ZhiS0WZEfzjOL4r3NNcHtWnA/OHubvvoTdwK7asRTFklzvLfdC7N
FvIFaIWDY1jwbV6fvEupDs0uKajt6H9ISBlZutWPv29JtlcDkZRdpWifF9IrSNPt
aUHmteajS2WAJLj3RBpfmwDttZki4pLR3uJyI9hCgA+TmLGOtltTjtk2g2Lt40qn
ytAxey+0rFmgpKCBtffmCo8k0bwPj4iWFAwXKI6SO1j6e35zZyGML1bC9COwqzZS
EjtLMjREBw1MOdI8ZQldEX4+DZJeF2I9YzkadBfzhhnJWyAHoVEFLjgOzSe/by5p
cnv4plUGNtxT7rdRMvnk
=Tg/d
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- ensure RecordRoute information is added to BAT_ICMP echo_request/reply only
- use VLAN_ETH_HLEN when possible
- use htons when possible
- substitute old fragmentation code with a new improved implementation by
Martin Hundebøll
- create common header for BAT_ICMP packets to improve extendibility
- consider the network coding overhead when computing the overall room needed by
batman headers
- add dummy soft-interface rx mode handler
- minor code refactoring and cleanups
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is one fix for the hotplug memory path that resolves a regression
when removing memory that showed up in 3.12-rc1.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlJgHqMACgkQMUfUDdst+ykplQCgym3aZCnwCAASgrM5g0fyM5LW
eR4AmQGqYBZBticsmGqLWDyUksrnL/6S
=zRIr
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is one fix for the hotplug memory path that resolves a regression
when removing memory that showed up in 3.12-rc1"
* tag 'driver-core-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Release device_hotplug_lock when store_mem_state returns EINVAL
Here are some USB fixes and new device ids for 3.12-rc6
The largest change here is a bunch of new device ids for the option USB
serial driver for new Huawei devices. Other than that, just some small
bug fixes for issues that people have reported (run-time and
build-time), nothing major.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlJgHZMACgkQMUfUDdst+yl5xgCePinICkasywmkWO0CgndCelCx
nyMAoIQ51ot0igMbmXoSAyq5u0EgVHmG
=/iU9
-----END PGP SIGNATURE-----
Merge tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes and new device ids for 3.12-rc6
The largest change here is a bunch of new device ids for the option
USB serial driver for new Huawei devices. Other than that, just some
small bug fixes for issues that people have reported (run-time and
build-time), nothing major"
* tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_register
usb: misc: usb3503: Fix compile error due to incorrect regmap depedency
usb/chipidea: fix oops on memory allocation failure
usb-storage: add quirk for mandatory READ_CAPACITY_16
usb: serial: option: blacklist Olivetti Olicard200
USB: quirks: add touchscreen that is dazzeled by remote wakeup
Revert "usb: musb: gadget: fix otg active status flag"
USB: quirks.c: add one device that cannot deal with suspension
USB: serial: option: add support for Inovia SEW858 device
USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well.
USB: support new huawei devices in option.c
usb: musb: start musb on the udc side, too
xhci: Fix spurious wakeups after S5 on Haswell
xhci: fix write to USB3_PSSEN and XUSB2PRM pci config registers
xhci: quirk for extra long delay for S4
xhci: Don't enable/disable RWE on bus suspend/resume.
Here are two serial driver fixes for your tree. One is a revert of a
patch that causes a build error, the other is a fix to provide the
correct brace placement which resolves a bug where the driver was not
working properly.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlJgH5oACgkQMUfUDdst+yn5ywCfU4Xq4jDcUyc3pEoKfX2kkNZk
7a8An2brQo3h20xHsIv2NprgYqCjfZpx
=rcEE
-----END PGP SIGNATURE-----
Merge tag 'tty-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are two serial driver fixes for your tree. One is a revert of a
patch that causes a build error, the other is a fix to provide the
correct brace placement which resolves a bug where the driver was not
working properly"
* tag 'tty-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: vt8500: add missing braces
Revert "serial: i.MX: evaluate linux,stdout-path property"
Here are some small iio and w1 driver fixes for 3.12-rc6.
There is also a hyper-v fix in here, which turned out to be incorrect,
so it was reverted. That will probably have to wait unto 3.13-rc1 to
get accepted as it's still being discussed.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlJgHiwACgkQMUfUDdst+ykS3gCgiJqKAbzsb+hNiI/AXmCHtcWV
cK4AoIEXbg+nMYPzSuSPz1nr2vJPr3CP
=5WpE
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small iio and w1 driver fixes for 3.12-rc6.
There is also a hyper-v fix in here, which turned out to be incorrect,
so it was reverted. That will probably have to wait unto 3.13-rc1 to
get accepted as it's still being discussed"
* tag 'char-misc-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "Drivers: hv: vmbus: Fix a bug in channel rescind code"
Drivers: hv: vmbus: Fix a bug in channel rescind code
iio:buffer: Free active scan mask in iio_disable_all_buffers()
iio: frequency: adf4350: add missing clk_disable_unprepare() on error in adf4350_probe()
w1 - call request_module with w1 master mutex unlocked
w1 - fix fops in w1_bus_notify
Nikita Kiryanov says:
====================
dm9000 improvements
This is a collection of improvements and bug fixes for dm9000, mostly
related to its startup and resume-from-suspend sequences.
Patch "Implement full reset of DM9000 network device" was submitted to the
linux-kernel mailing list but never applied.
An archive of the submission and the following conversation can be found here:
http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/02817.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the LPA by checking mii_if_info, instead of just saying "no LPA" every
time.
Cc: David S. Miller <davem@davemloft.net>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Nikita Kiryanov <nikita@compulab.co.il>
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
A Davicom application note for the DM9000 network device recommends
performing software reset twice to correctly initialise the device.
Without this reset some devices fail to initialise correctly on
system startup.
Cc: David S. Miller <davem@davemloft.net>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Michael Abbott <michael.abbott@diamond.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take the phy out of reset explicitly during system resume to avoid
losing network connectivity.
Cc: David S. Miller <davem@davemloft.net>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Nikita Kiryanov <nikita@compulab.co.il>
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the changes introduced in commit 6741f40 (DM9000B: driver
initialization upgrade) break functionality on DM9000A
(error message during NFS boot: "dm9000 dm9000.0: eth0: link down")
Since the changes were meant to serve only DM9000B, make them
dependent on the chip type.
Cc: David S. Miller <davem@davemloft.net>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Nikita Kiryanov <nikita@compulab.co.il>
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
All reasonably small fixes as rc6: a HD-audio mic fix, a us122l mmap
regression fix, and kernel memory leak fix in hdsp driver.
Hopefully this will be the last pull request for 3.12...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJSX6ebAAoJEGwxgFQ9KSmkkiIP/1x53z2Ri08yXApUm4pzSmqo
4a2N7sfvv/iXZnRpG++RN3UOiqGYdx2Yncn0cdPLrLmw7T/GxYw3Y0C2nHZNiOLa
f1Dw0KkZ/l7mPkpSSxYLOeNCA4i8buvC11M2vQ/453XrkSBvCCt1L05fvpO/GDeT
2u8ceDmata+ozKc6K3hwrc0SOKWud0h9dqxpREnY+oJp4/t48/uuswtw6tNqllze
HsHlOwGOLec5YJDg68S9DuI1jvcwnZ6acH/9NtUEzh7QfekngHgYpJEr3Bkpk+CW
xzvZw8UrHn1YlIzn1HIYa/rFWDsFM3FyFUaLRsfX8Vccq2H5oaUOcCrKNjTSRrSf
7DRG/J4aB733aZ4R4emzcRiQPul3mb9ss7Sg1+Ec0nPHQLGwG1tDR4Ph4E8cTPgd
DL6Xsmot90rPD6KQOaMyqpuWwRg0wzT/AsOEHqC2m1JlMjn0aZHQh+/cxVO5beSv
bYOvnq/HLt8T8JyetQmkb4Q7jsGLSPjYuPcV8QH4kSfnBYybKu52zPXc00Ph3N7O
ZyBfXBqTtHuIXYuDZisRJZk/zs+fOqekjCevF94N4zfIDi0exKsTwkRmZ1+e9uiv
0wnlEsNoy2Ts30imngLtf10RxlpjrCZ2VLp/CsOTcrbH6Mk76niXwMbs/5OVwkLl
6SEQ6OhqeERc7vKJX1bg
=AeDk
-----END PGP SIGNATURE-----
Merge tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All reasonably small fixes as rc6: a HD-audio mic fix, a us122l mmap
regression fix, and kernel memory leak fix in hdsp driver. Hopefully
this will be the last pull request for 3.12..."
* tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hdsp - info leak in snd_hdsp_hwdep_ioctl()
ALSA: us122l: Fix pcm_usb_stream mmapping regression
ALSA: hda - Fix inverted internal mic not indicated on some machines
Pull apparmor fixes from James Morris:
"A couple more regressions fixed"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
apparmor: fix bad lock balance when introspecting policy
apparmor: fix memleak of the profile hash
Two little ones this time:
1) A missing clk_unprepare in adf4350.
2) A missing free of the active_scan_mask when iio_disable_all_buffers is
called during an unexpected device removal. This leak was introduced by
the fix
a87c82e454 iio: Stop sampling when the device is removed
and hence is a regression fix.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJSXbrBAAoJEFSFNJnE9BaILWgP/jMEnyNq4xImBkpLgYyJvF8F
s4Pt8on4/F+JwV21gljJB8g8JiblSAECTbxFaUDCAV6ufJ/0djSwb0DDQ1JfdfGI
a/2vC4YuhTGPi35s9NIXr9TuGS1cdYQ3o+bijNJiGJXIakI/ngwDWOpDE1MOFVv3
0YVHKkhOARuObJ5L+5xYeCsGdNuv9eQ7SvhLyy1LJ4123wCu2cRKFBWP/UX9wJlF
hLaeKXoR/jDiPaal6vIftLPDMsOILymUqjAy6QpQpDAp7zL9YenG8tSKiOWRgxE2
gMfsbJks2KM7LyWY8i6tyF0ID62QYh2PMYEQVmKdOH7ffROeO1WamXshT2ls4X++
+NB0Ih4cJFT2ryEGuG7R2isa36IP0KXTE2aPI2LmDHyJrHxxazfRooEOUks3gn+I
P1RhkJTJ8It39EiV+DgzFowEi5kwolUBv+7y56eqQ1GBWVVyutr5mmN8WKhT7fjE
3xMdaBCwbRCK6f/Kvl5TY4rIL8lG/aVP8X3bkJQFiduAtCrqphX+djFdCRn5nILZ
FALZmNZn9tI47kksEy3L+WhiOGf/7h13jULF+EHgYDhJuJ/KT8AlggqQIh/AFXWF
cDX92gC9mBCtlopYCLYh4ZDHr2L13X05W24irsIZak9TGEw8IXpPOWIqxUrLXMWl
zyeW1/f8ZO7C0E+sV0gu
=Whi7
-----END PGP SIGNATURE-----
Merge tag 'iio-fixes-for-3.12c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-linus
Jonathan writes:
Third set of IIO fixes for the 3.12 cycle.
Two little ones this time:
1) A missing clk_unprepare in adf4350.
2) A missing free of the active_scan_mask when iio_disable_all_buffers is
called during an unexpected device removal. This leak was introduced by
the fix
a87c82e454 iio: Stop sampling when the device is removed
and hence is a regression fix.
Commit 3fa4d734 (usb: phy: rename nop_usb_xceiv => usb_phy_gen_xceiv)
changed the conditional around the declaration of usb_nop_xceiv_register
from
#if defined(CONFIG_NOP_USB_XCEIV) ||
(defined(CONFIG_NOP_USB_XCEIV_MODULE) && defined(MODULE))
to
#if IS_ENABLED(CONFIG_NOP_USB_XCEIV)
While that looks the same, it is semantically different. The first expression
is true if CONFIG_NOP_USB_XCEIV is built as module and if the including
code is built as module. The second expression is true if code depending on
CONFIG_NOP_USB_XCEIV if built as module or into the kernel.
As a result, the arm:allmodconfig build fails with
arch/arm/mach-omap2/built-in.o: In function `omap3_evm_init':
arch/arm/mach-omap2/board-omap3evm.c:703: undefined reference to
`usb_nop_xceiv_register'
Fix the problem by reverting to the old conditional.
Cc: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 90d33f3ec5 as it's not
the correct fix for this issue, and it causes a build warning to be
added to the kernel tree.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Two functions defined in device_pm.c, acpi_dev_pm_add_dependent()
and acpi_dev_pm_remove_dependent(), have no callers and may be
dropped, so drop them.
Moreover, they are the only functions adding entries to and removing
entries from the power_dependent list in struct acpi_device, so drop
that list too.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Previously, we wanted SCSI devices corrsponding to ATA devices to
be runtime resumed when the power resource for those ATA device was
turned on by some other device, so we added the SCSI device to the
dependent device list of the ATA device's ACPI node. However, this
code has no effect after commit 41863fc (ACPI / power: Drop automaitc
resume of power resource dependent devices) and the mechanism it was
supposed to implement is regarded as a bad idea now, so drop it.
[rjw: Changelog]
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
mm: revert mremap pud_free anti-fix
mm: fix BUG in __split_huge_page_pmd
swap: fix set_blocksize race during swapon/swapoff
procfs: call default get_unmapped_area on MMU-present architectures
procfs: fix unintended truncation of returned mapped address
writeback: fix negative bdi max pause
percpu_refcount: export symbols
fs: buffer: move allocation failure loop into the allocator
mm: memcg: handle non-error OOM situations more gracefully
tools/testing/selftests: fix uninitialized variable
block/partitions/efi.c: treat size mismatch as a warning, not an error
mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
mm/zswap: bugfix: memory leak when re-swapon
mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
mm: migration: do not lose soft dirty bit if page is in migration state
gcov: MAINTAINERS: Add an entry for gcov
mm/hugetlb.c: correct missing private flag clearing
mm/vmscan.c: don't forget to free shrinker->nr_deferred
ipc/sem.c: synchronize semop and semctl with IPC_RMID
ipc: update locking scheme comments
...
Revert commit 1ecfd533f4 ("mm/mremap.c: call pud_free() after fail
calling pmd_alloc()").
The original code was correct: pud_alloc(), pmd_alloc(), pte_alloc_map()
ensure that the pud, pmd, pt is already allocated, and seldom do they
need to allocate; on failure, upper levels are freed if appropriate by
the subsequent do_munmap(). Whereas commit 1ecfd533f4 did an
unconditional pud_free() of a most-likely still-in-use pud: saved only
by the near-impossiblity of pmd_alloc() failing.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of
__split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED).
It's invalid: we don't always have down_write of mmap_sem there: a racing
do_huge_pmd_wp_page() might have copied-on-write to another huge page
before our split_huge_page() got the anon_vma lock.
Forget the BUG_ON, just go back and try again if this happens.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix race between swapoff and swapon. Swapoff used old_block_size from
swap_info outside of swapon_mutex so it could be overwritten by
concurrent swapon.
The race has visible effect only if more than one swap block device
exists with different block sizes (e.g. /dev/sda1 with block size 4096
and /dev/sdb1 with 512). In such case it leads to setting the blocksize
of swapped off device with wrong blocksize.
The bug can be triggered with multiple concurrent swapoff and swapon:
0. Swap for some device is on.
1. swapoff:
First the swapoff is called on this device and "struct swap_info_struct
*p" is assigned. This is done under swap_lock however this lock is
released for the call try_to_unuse().
2. swapon:
After the assignment above (and before acquiring swapon_mutex &
swap_lock by swapoff) the swapon is called on the same device.
The p->old_block_size is assigned to the value of block_size the device.
This block size should be the same as previous but sometimes it is not.
The swapon ends successfully.
3. swapoff:
Swapoff resumes, grabs the locks and mutex and continues to disable this
swap device. Now it sets the block size to value taken from swap_info
which was overwritten by swapon in 2.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reported-by: Weijie Yang <weijie.yang.kh@gmail.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit c4fe244857 ("sparc: fix PCI device proc file mmap(2)") added
proc_reg_get_unmapped_area in proc_reg_file_ops and
proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
get_unmapped_area method is not defined for the target procfs file,
which causes regression of mmap on /proc/vmcore.
To address this issue, like get_unmapped_area(), call default
current->mm->get_unmapped_area on MMU-present architectures if
pde->proc_fops->get_unmapped_area, i.e. the one in actual file
operation in the procfs file, is not defined.
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
mapped virtual address returned from get_unmapped_area method in
pde->proc_fops due to the variable rv of signed integer on x86_64. This
is too small to have vitual address of unsigned long on x86_64 since on
x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
To fix this issue, use unsigned long instead.
Fixes a regression added in commit c4fe244857 ("sparc: fix PCI device
proc file mmap(2)").
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Toralf runs trinity on UML/i386. After some time it hangs and the last
message line is
BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521]
It's found that pages_dirtied becomes very large. More than 1000000000
pages in this case:
period = HZ * pages_dirtied / task_ratelimit;
BUG_ON(pages_dirtied > 2000000000);
BUG_ON(pages_dirtied > 1000000000); <---------
UML debug printf shows that we got negative pause here:
ick: pause : -984
ick: pages_dirtied : 0
ick: task_ratelimit: 0
pause:
+ if (pause < 0) {
+ extern int printf(char *, ...);
+ printf("ick : pause : %li\n", pause);
+ printf("ick: pages_dirtied : %lu\n", pages_dirtied);
+ printf("ick: task_ratelimit: %lu\n", task_ratelimit);
+ BUG_ON(1);
+ }
trace_balance_dirty_pages(bdi,
Since pause is bounded by [min_pause, max_pause] where min_pause is also
bounded by max_pause. It's suspected and demonstrated that the
max_pause calculation goes wrong:
ick: pause : -717
ick: min_pause : -177
ick: max_pause : -717
ick: pages_dirtied : 14
ick: task_ratelimit: 0
The problem lies in the two "long = unsigned long" assignments in
bdi_max_pause() which might go negative if the highest bit is 1, and the
min_t(long, ...) check failed to protect it falling under 0. Fix all of
them by using "unsigned long" throughout the function.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Export the interface to be used within modules.
Signed-off-by: Matias Bjorling <m@bjorling.me>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Buffer allocation has a very crude indefinite loop around waking the
flusher threads and performing global NOFS direct reclaim because it can
not handle allocation failures.
The most immediate problem with this is that the allocation may fail due
to a memory cgroup limit, where flushers + direct reclaim might not make
any progress towards resolving the situation at all. Because unlike the
global case, a memory cgroup may not have any cache at all, only
anonymous pages but no swap. This situation will lead to a reclaim
livelock with insane IO from waking the flushers and thrashing unrelated
filesystem cache in a tight loop.
Use __GFP_NOFAIL allocations for buffers for now. This makes sure that
any looping happens in the page allocator, which knows how to
orchestrate kswapd, direct reclaim, and the flushers sensibly. It also
allows memory cgroups to detect allocations that can't handle failure
and will allow them to ultimately bypass the limit if reclaim can not
make progress.
Reported-by: azurIt <azurit@pobox.sk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 3812c8c8f3 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead. But there are many more and it's impractical to annotate
them all.
First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well. This simplifies the code quite a bit for
added bonus.
Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts. If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.
Reported-by: azurIt <azurit@pobox.sk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The err variable is intended to receive the timer_create() return before
checking it
Signed-off-by: Felipe Pena <felipensp@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 27a7c64217 ("partitions/efi: account for pmbr size in lba")
we started treating bad sizes in lba field of the partition that has the
0xEE (GPT protective) as errors.
However, we may run into these "bad sizes" in the real world if someone
uses dd to copy an image from a smaller disk to a bigger disk. Since
this case used to work (even without using force_gpt), keep it working
and treat the size mismatch as a warning instead of an error.
Reported-by: Josh Triplett <josh@joshtriplett.org>
Reported-by: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Doug Anderson <dianders@chromium.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Tested-by: Artem Bityutskiy <dedekind1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 11feeb4980 ("kvm: optimize away THP checks in
kvm_is_mmio_pfn()") introduced a memory leak when KVM is run on gigantic
compound pages.
That commit depends on the assumption that PG_reserved is identical for
all head and tail pages of a compound page. So that if get_user_pages
returns a tail page, we don't need to check the head page in order to
know if we deal with a reserved page that requires different
refcounting.
The assumption that PG_reserved is the same for head and tail pages is
certainly correct for THP and regular hugepages, but gigantic hugepages
allocated through bootmem don't clear the PG_reserved on the tail pages
(the clearing of PG_reserved is done later only if the gigantic hugepage
is freed).
This patch corrects the gigantic compound page initialization so that we
can retain the optimization in 11feeb4980. The cacheline was already
modified in order to set PG_tail so this won't affect the boot time of
large memory systems.
[akpm@linux-foundation.org: tweak comment layout and grammar]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: andy123 <ajs124.ajs124@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zswap_tree is not freed when swapoff, and it got re-kmalloced in swapon,
so a memory leak occurs.
Free the memory of zswap_tree in zswap_frontswap_invalidate_area().
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
From: Weijie Yang <weijie.yang@samsung.com>
Subject: mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently
Consider the following scenario:
thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
finished, entry x and its zbud is not freed as its refcount != 0
now, the swap_map[x] = 0
thread 0: now call zswap_get_swap_cache_page
swapcache_prepare return -ENOENT because entry x is not used any more
zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
zswap_writeback_entry do nothing except put refcount
Now, the memory of zswap_entry x and its zpage leak.
Modify:
- check the refcount in fail path, free memory if it is not referenced.
- use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
can be not only caused by nomem but also by invalidate.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a page we are inspecting is in swap we may occasionally report it as
having soft dirty bit (even if it is clean). The pte_soft_dirty helper
should be called on present pte only.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If page migration is turned on in config and the page is migrating, we
may lose the soft dirty bit. If fork and mprotect are called on
migrating pages (once migration is complete) pages do not obtain the
soft dirty bit in the correspond pte entries. Fix it adding an
appropriate test on swap entries.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should clear the page's private flag when returing the page to the
hugepage pool. Otherwise, marked hugepage can be allocated to the user
who tries to allocate the non-reserved hugepage. If this user fail to
map this hugepage, he would try to return the page to the hugepage pool.
Since this page has a private flag, resv_huge_pages would mistakenly
increase. This patch fixes this situation.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After acquiring the semlock spinlock, operations must test that the
array is still valid.
- semctl() and exit_sem() would walk stale linked lists (ugly, but
should be ok: all lists are empty)
- semtimedop() would sleep forever - and if woken up due to a signal -
access memory after free.
The patch also:
- standardizes the tests for .deleted, so that all tests in one
function leave the function with the same approach.
- unconditionally tests for .deleted immediately after every call to
sem_lock - even it it means that for semctl(GETALL), .deleted will be
tested twice.
Both changes make the review simpler: After every sem_lock, there must
be a test of .deleted, followed by a goto to the cleanup code (if the
function uses "goto cleanup").
The only exception is semctl_down(): If sem_ids().rwsem is locked, then
the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
additional test is required.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <efault@gmx.de>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The initial documentation was a bit incomplete, update accordingly.
[akpm@linux-foundation.org: make it more readable in 80 columns]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
for_each_online_cpu() needs the protection of {get,put}_online_cpus() so
cpu_online_mask doesn't change during the iteration.
cpu_hotplug.lock is held while a cpu is going down, it's a coarse lock
that is used kernel-wide to synchronize cpu hotplug activity. Memcg has
a cpu hotplug notifier, called while there may not be any cpu hotplug
refcounts, which drains per-cpu event counts to memcg->nocpu_base.events
to maintain a cumulative event count as cpus disappear. Without
get_online_cpus() in mem_cgroup_read_events(), it's possible to account
for the event count on a dying cpu twice, and this value may be
significantly large.
In fact, all memcg->pcp_counter_lock use should be nested by
{get,put}_online_cpus().
This fixes that issue and ensures the reported statistics are not vastly
over-reported during cpu hotplug.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When inserting a wrong value to /sys/devices/system/memory/memoryX/state file,
following messages are shown. And device_hotplug_lock is never released.
================================================
[ BUG: lock held when returning to user space! ]
3.12.0-rc4-debug+ #3 Tainted: G W
------------------------------------------------
bash/6442 is leaving the kernel with locks still held!
1 lock held by bash/6442:
#0: (device_hotplug_lock){+.+.+.}, at: [<ffffffff8146cbb5>] lock_device_hotplug_sysfs+0x15/0x50
This issue was introdued by commit fa2be40 (drivers: base: use standard
device online/offline for state change).
This patch releases device_hotplug_lcok when store_mem_state returns EINVAL.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
CC: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>