Commit Graph

602372 Commits

Author SHA1 Message Date
Linus Torvalds
d46d0256cd Merge branch 'akpm' (patches from Andrew)
Merge various fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
  mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
  mm, oom_reaper: do not use siglock in try_oom_reaper()
  mm, page_alloc: prevent infinite loop in buffered_rmqueue()
  checkpatch: reduce git commit description style false positives
  mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup
  memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
  mm: check the return value of lookup_page_ext for all call sites
  kdump: fix dmesg gdbmacro to work with record based printk
  mm: fix overflow in vm_map_ram()
2016-06-04 10:51:29 -07:00
Al Viro
fac7d1917d fix EOPENSTALE bug in do_last()
EOPENSTALE occuring at the last component of a trailing symlink ends up
with do_last() retrying its lookup.  After the symlink body has been
discarded.  The thing is, all this retry_lookup logics in there is not
needed at all - the upper layers will do the right thing if we simply
return that -EOPENSTALE as we would with any other error.  Trying to
microoptimize in do_last() is a lot of headache for no good reason.

Cc: stable@vger.kernel.org # v4.2+
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Reviewed-and-Tested-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-04 11:59:04 -04:00
Geert Uytterhoeven
182fd9eecb MAINTAINERS: Add file patterns for wireless device tree bindings
Submitters of device tree binding documentation may forget to CC
the subsystem maintainer if this is missing.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-06-04 17:24:02 +03:00
David S. Miller
76f21b9900 net: Add docbook description for 'mtu' arg to skb_gso_validate_mtu()
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 22:56:28 -07:00
David S. Miller
3b55a537d0 sctp: Fix warning in sctp_packet_transmit_chunk()
size_t objects should be printed with %Z printf format.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 22:53:26 -07:00
Yuval Mintz
81b1251d3b qed: Fix next-ptr chains for BE / 32-bit
Commit a91eb52abb ("qed: Revisit chain implementation") contains an
incorrect implementation for BE platforms, as device's regpairs containing
addresses are LE and they're not converted correctly when read back.
In addition, it raises a compilation warning for 32-bit platforms where
dma_addr_t is a 32-bit variable.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 22:42:26 -07:00
David S. Miller
03c7f70bee Merge branch 'qed-roce-iscsi'
Yuval Mintz says:

====================
qed: RocE & iSCSI infrastructure

We plan on sending 2 new protocol drivers in the imminent future -
both our RoCE [qedr] and iSCSI [qedi] drivers. As both submissions
would be rather massive and in order to avoid collisions between them,
the common infrastructure on the qed side was prepared as an independent
patch-series to be sent ahead of those 2 submissions.

This patch series introduces in QED 2 new 'ids' - one for iscsi and
one for roce. It then goes and adds logic required for configuring
said protocols in HW. Notice it *doesn't* actually add any client using
said ids, but rather only the infrastructure to allow their later usage.

What this patch doesn't contain is the slowpath protocol-configuration
toward the firmware. I.e., it contains register-setting logic, memory
allocations, etc., but not actual flow-related configuration specific
to the protocl. Those would be sent as part of the protocol driver
submissions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 20:08:45 -04:00
Yuval Mintz
dbb799c397 qed: Initialize hardware for new protocols
RoCE and iSCSI would require some added/changed hw configuration in order
to properly run; The biggest single change being the requirement of
allocating and mapping host memory for several HW blocks that aren't being
used by qede [SRC, QM, TM, etc.].

In addition, whereas qede is only using context memory for HW blocks, the
new protocol would also require task memories to be added.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 20:08:40 -04:00
Yuval Mintz
c5ac93191d qed: Add iscsi/rdma personalities
This patch adds in the ecore 2 new personalities in addition to
QED_PCI_ETH - QED_PCI_ISCSI and QED_PCI_ETH_ROCE.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 20:08:39 -04:00
Yuval Mintz
7a9b6b8f6e qed: Add common HSI for new protocols
This adds the qed portion of the RoCE & iSCSI firmware HSI,
as well as adding several new common HSI files which would be required
by both qed and qed* protocols.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 20:08:39 -04:00
Yuval Mintz
a91eb52abb qed: Revisit chain implementation
RoCE driver is going to need a 32-bit chain [current chain implementation
for qed* currently supports only 16-bit producer/consumer chains].

This patch adds said support, as well as doing other slight tweaks and
modifications to qed's chain API.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 20:08:39 -04:00
David S. Miller
e7eacc9e0b Merge branch 'mediatek-fixes'
John Crispin says:

====================
net-next: mediatek: improve phy support

The current driver did not handle the RGMII delay modes and asymmetric flow
control properly. The mii_bus is not freed properly. Also add support for
fixed-phy allowing the driver to work on SoCs that have an internal gigabit
switch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:54:23 -04:00
John Crispin
37920fce0f net-next: mediatek: properly handle RGMII modes
If an external Gigabit PHY is connected to either of the MACs we need to
be able to tell the PHY to use a delay. Not doing so will result in heavy
packet loss and/or data corruption when using PHYs such as the IC+ IP1001.
We tell the PHY which MII delay mode to use via the devictree.

The ethernet driver needs to be adapted to handle all 3 rgmii-*id modes
in the same way as normal rgmii when setting up the MAC.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:54:16 -04:00
John Crispin
0c72c50f6f net-next: mediatek: add fixed-phy support
The MT7623 SoC has a builtin gigabit switch. If we want to use it, GMAC1
needs to be configured using a fixed link speed and flow control settings.
The easiest way to do this is to used the fixed-phy driver, allowing us to
reuse the existing mdio polling code to setup the MAC.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:54:16 -04:00
John Crispin
08ef55c6f2 net-next: mediatek: fix gigabit and flow control advertisement
The current code will not setup the PHYs advertisement features correctly.
Fix this and properly advertise Gigabit features and properly handle
asymmetric pause frames.

Signed-off-by: Sean Wang <keyhaede@gmail.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:54:16 -04:00
John Crispin
207bdf1844 net-next: mediatek: use mdiobus_free() in favour of kfree()
The driver currently uses kfree() to clear the mii_bus. This is not the
correct way to clear the memory and mdiobus_free() should be used instead.
This patch fixes the two instances where this happens in the driver.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:54:16 -04:00
Ivan Khoronzhuk
330348d942 net: ethernet: ti: cpsw: remove unused priv lock
There is no reason in this lock. At least for now.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:43:26 -04:00
Joe Perches
9b6d53985f rxrpc: Use pr_<level> and pr_fmt, reduce object size a few KB
Use the more common kernel logging style and reduce object size.

The logging message prefix changes from a mixture of
"RxRPC:" and "RXRPC:" to "af_rxrpc: ".

$ size net/rxrpc/built-in.o*
   text	   data	    bss	    dec	    hex	filename
  64172	   1972	   8304	  74448	  122d0	net/rxrpc/built-in.o.new
  67512	   1972	   8304	  77788	  12fdc	net/rxrpc/built-in.o.old

Miscellanea:

o Consolidate the ASSERT macros to use a single pr_err call with
  decimal and hexadecimal output and a stringified #OP argument

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:41:31 -04:00
Haiyang Zhang
cb2911fed6 hv_netvsc: Fix VF register on vlan devices
Added a condition to avoid vlan devices with same MAC registering
as VF.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:40:05 -04:00
David S. Miller
db54275dff Merge branch 'sctp-gso'
Marcelo Ricardo Leitner says:

====================
sctp: Add GSO support

This patchset adds sctp GSO support.

Performance tests indicates that increases throughput by 10% if using
bigger chunk sizes, specially if bigger than MTU. For small chunks, it
doesn't help much if not using heavy firewall rules.

For small chunks it will probably be of more use once we get something
like MSG_MORE as David Laight had suggested.

overall changes:
v1->v2:
Added support for receiving GSO frames on SCTP stack, as requested by
Dave Miller.

v2->v3:
Consider sctphdr size in skb_gso_transport_seglen()
rebased due to 5c7cdf339a ("gso: Remove arbitrary checks for
unsupported GSO")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:26 -04:00
Marcelo Ricardo Leitner
942b3235bf sctp: improve debug message to also log curr pkt and new chunk size
This is useful for debugging packet sizes.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:22 -04:00
Marcelo Ricardo Leitner
90017accff sctp: Add GSO support
SCTP has this pecualiarity that its packets cannot be just segmented to
(P)MTU. Its chunks must be contained in IP segments, padding respected.
So we can't just generate a big skb, set gso_size to the fragmentation
point and deliver it to IP layer.

This patch takes a different approach. SCTP will now build a skb as it
would be if it was received using GRO. That is, there will be a cover
skb with protocol headers and children ones containing the actual
segments, already segmented to a way that respects SCTP RFCs.

With that, we can tell skb_segment() to just split based on frag_list,
trusting its sizes are already in accordance.

This way SCTP can benefit from GSO and instead of passing several
packets through the stack, it can pass a single large packet.

v2:
- Added support for receiving GSO frames, as requested by Dave Miller.
- Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
- Added heuristics similar to what we have in TCP for not generating
  single GSO packets that fills cwnd.
v3:
- consider sctphdr size in skb_gso_transport_seglen()
- rebased due to 5c7cdf339a ("gso: Remove arbitrary checks for
  unsupported GSO")

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Marcelo Ricardo Leitner
3acb50c18d sctp: delay as much as possible skb_linearize
This patch is a preparation for the GSO one. In order to successfully
handle GSO packets on rx path we must not call skb_linearize, otherwise
it defeats any gain GSO may have had.

This patch thus delays as much as possible the call to skb_linearize,
leaving it to sctp_inq_pop() moment. For that the sanity checks
performed now know how to deal with fragments.

One positive side-effect of this is that if the socket is backlogged it
will have the chance of doing it on backlog processing instead of
during softirq.

With this move, it's evident that a check for non-linearity in
sctp_inq_pop was ineffective and is now removed. Note that a similar
check is performed a bit below this one.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Marcelo Ricardo Leitner
ae7ef81ef0 skbuff: introduce skb_gso_validate_mtu
skb_gso_network_seglen is not enough for checking fragment sizes if
skb is using GSO_BY_FRAGS as we have to check frag per frag.

This patch introduces skb_gso_validate_mtu, based on the former, which
will wrap the use case inside it as all calls to skb_gso_network_seglen
were to validate if it fits on a given TMU, and improve the check.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Marcelo Ricardo Leitner
3953c46c3a sk_buff: allow segmenting based on frag sizes
This patch allows segmenting a skb based on its frags sizes instead of
based on a fixed value.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Marcelo Ricardo Leitner
57c0565039 skbuff: export skb_gro_receive
sctp GSO requires it and sctp can be compiled as a module, so we need to
export this function.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Marcelo Ricardo Leitner
f6c382fc55 loopback: make use of NETIF_F_GSO_SOFTWARE
NETIF_F_GSO_SOFTWARE was defined to list all GSO software types, so lets
make use of it in loopback code. Note that veth/vxlan/others already
uses it.

Within this patch series, this patch causes lo to pick up SCTP GSO feature
automatically (as it's added to NETIF_F_GSO_SOFTWARE) and thus avoiding
segmentation if possible.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:37:21 -04:00
Ivan Khoronzhuk
8478b6cdc1 net: ethernet: ti: cpsw: fix rx-usecs interrupt pacing consistency
The rx-usecs shouldn't be changed while interface down/up.
Currently, for instance, if it's set to 100us, after interface
down/up it's 500us. It's a hidden bug that can lead to lavish
interrupt pacing time increasing while "down/up" up to max value.

Steps to reproduce:
- set rx-usecs to be 100us
- down/up interface
- read new unexpected rx-usecs

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:35:06 -04:00
Yangbo Lu
9c8b0778e4 gianfar: fix the last transmit buffer descriptor
When the transmit hardware timestamping is enabled, an additional
TxBD would be added and would be set as the last TxBD with TXBD_LAST
and TXBD_INTERRUPT. However this has been broken by a patch recently.
This made the software couldn't get transmit hardware timestamps and
resulted in call trace. So, this patch is to fix this issue.

Fixes: 48963b4492 ("gianfar: Remove redundant ops for do_tstamp
       from xmit()")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:32:23 -04:00
Bhaktipriya Shridhar
f2edc4e1b0 net: fjes: fjes_main: Remove create_workqueue
alloc_workqueue replaces deprecated create_workqueue().

The workqueue adapter->txrx_wq has workitem
&adapter->raise_intr_rxdata_task per adapter. Extended Socket Network
Device is shared memory based, so someone's transmission denotes other's
reception.  raise_intr_rxdata_task raises interruption of receivers from
the sender in order to notify receivers.

The workqueue adapter->control_wq has workitem
&adapter->interrupt_watch_task per adapter. interrupt_watch_task is used
to prevent delay of interrupts.

Dedicated workqueues have been used in both cases since the workitems
on the workqueues are involved in normal device operation and require
forward progress under memory pressure.

max_active has been set to 0 since there is no need for throttling
the number of active work items.

Since network devices  may be used for memory reclaim,
WQ_MEM_RECLAIM has been set to guarantee forward progress.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:29:42 -04:00
WANG Cong
8d5958f424 sch_tbf: update backlog as well
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
WANG Cong
d7f4f332f0 sch_red: update backlog as well
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
WANG Cong
6a73b571b6 sch_drr: update backlog as well
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
WANG Cong
6529d75ad9 sch_prio: update backlog as well
We need to update backlog too when we update qlen.

Joint work with Stas.

Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Tested-by: Stas Nichiporovich <stasn77@gmail.com>
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
WANG Cong
357cc9b4a8 sch_hfsc: always keep backlog updated
hfsc updates backlog lazily, that is only when we
dump the stats. This is problematic after we begin to
update backlog in qdisc_tree_reduce_backlog().

Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Tested-by: Stas Nichiporovich <stasn77@gmail.com>
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
Linus Torvalds
8c52b6dcdd Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 - a few simple fixes for fallout from the recent gic-v3 changes
 - a workaround for a Cavium thunderX erratum
 - a bugfix for the pic32 irqchip to make external interrupts work proper
 - a missing return value in the generic IPI management code

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-pic32-evic: Fix bug with external interrupts.
  irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144
  irqchip/gic-v3: Fix quiescence check in gic_enable_redist
  irqchip/gic-v3: Fix copy+paste mistakes in defines
  irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask
  genirq: Fix missing return value in irq_destroy_ipi()
2016-06-03 16:12:35 -07:00
Mel Gorman
e46e7b77c9 mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
The optimistic fast path may use cpuset_current_mems_allowed instead of
of a NULL nodemask supplied by the caller for cpuset allocations.  The
preferred zone is calculated on this basis for statistic purposes and as
a starting point in the zonelist iterator.

However, if the context can ignore memory policies due to being atomic
or being able to ignore watermarks then the starting point in the
zonelist iterator is no longer correct.  This patch resets the zonelist
iterator in the allocator slowpath if the context can ignore memory
policies.  This will alter the zone used for statistics but only after
it is known that it makes sense for that context.  Resetting it before
entering the slowpath would potentially allow an ALLOC_CPUSET allocation
to be accounted for against the wrong zone.  Note that while nodemask is
not explicitly set to the original nodemask, it would only have been
overwritten if cpuset_enabled() and it was reset before the slowpath was
entered.

Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net
Fixes: c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 16:02:57 -07:00
Mel Gorman
0d0bd89435 mm, page_alloc: reset zonelist iterator after resetting fair zone allocation policy
Geert Uytterhoeven reported the following problem that bisected to
commit c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone
in a zonelist twice") on m68k/ARAnyM

    BUG: scheduling while atomic: cron/668/0x10c9a0c0
    Modules linked in:
    CPU: 0 PID: 668 Comm: cron Not tainted 4.6.0-atari-05133-gc33d6c06f60f710f #364
    Call Trace: [<0003d7d0>] __schedule_bug+0x40/0x54
      __schedule+0x312/0x388
      __schedule+0x0/0x388
      prepare_to_wait+0x0/0x52
      schedule+0x64/0x82
      schedule_timeout+0xda/0x104
      set_next_entity+0x18/0x40
      pick_next_task_fair+0x78/0xda
      io_schedule_timeout+0x36/0x4a
      bit_wait_io+0x0/0x40
      bit_wait_io+0x12/0x40
      __wait_on_bit+0x46/0x76
      wait_on_page_bit_killable+0x64/0x6c
      bit_wait_io+0x0/0x40
      wake_bit_function+0x0/0x4e
      __lock_page_or_retry+0xde/0x124
      do_scan_async+0x114/0x17c
      lookup_swap_cache+0x24/0x4e
      handle_mm_fault+0x626/0x7de
      find_vma+0x0/0x66
      down_read+0x0/0xe
      wait_on_page_bit_killable_timeout+0x77/0x7c
      find_vma+0x16/0x66
      do_page_fault+0xe6/0x23a
      res_func+0xa3c/0x141a
      buserr_c+0x190/0x6d4
      res_func+0xa3c/0x141a
      buserr+0x20/0x28
      res_func+0xa3c/0x141a
      buserr+0x20/0x28

The relationship is not obvious but it's due to a failure to rescan the
full zonelist after the fair zone allocation policy exhausts the batch
count.  While this is a functional problem, it's also a performance
issue.  A page allocator microbenchmark showed the following

                                   4.7.0-rc1                  4.7.0-rc1
                                     vanilla                 reset-v1r2
  Min      alloc-odr0-1     327.00 (  0.00%)           326.00 (  0.31%)
  Min      alloc-odr0-2     235.00 (  0.00%)           235.00 (  0.00%)
  Min      alloc-odr0-4     198.00 (  0.00%)           198.00 (  0.00%)
  Min      alloc-odr0-8     170.00 (  0.00%)           170.00 (  0.00%)
  Min      alloc-odr0-16    156.00 (  0.00%)           156.00 (  0.00%)
  Min      alloc-odr0-32    150.00 (  0.00%)           150.00 (  0.00%)
  Min      alloc-odr0-64    146.00 (  0.00%)           146.00 (  0.00%)
  Min      alloc-odr0-128   145.00 (  0.00%)           145.00 (  0.00%)
  Min      alloc-odr0-256   155.00 (  0.00%)           155.00 (  0.00%)
  Min      alloc-odr0-512   168.00 (  0.00%)           165.00 (  1.79%)
  Min      alloc-odr0-1024  175.00 (  0.00%)           174.00 (  0.57%)
  Min      alloc-odr0-2048  180.00 (  0.00%)           180.00 (  0.00%)
  Min      alloc-odr0-4096  187.00 (  0.00%)           186.00 (  0.53%)
  Min      alloc-odr0-8192  190.00 (  0.00%)           190.00 (  0.00%)
  Min      alloc-odr0-16384 191.00 (  0.00%)           191.00 (  0.00%)
  Min      alloc-odr1-1     736.00 (  0.00%)           445.00 ( 39.54%)
  Min      alloc-odr1-2     343.00 (  0.00%)           335.00 (  2.33%)
  Min      alloc-odr1-4     277.00 (  0.00%)           270.00 (  2.53%)
  Min      alloc-odr1-8     238.00 (  0.00%)           233.00 (  2.10%)
  Min      alloc-odr1-16    224.00 (  0.00%)           218.00 (  2.68%)
  Min      alloc-odr1-32    210.00 (  0.00%)           208.00 (  0.95%)
  Min      alloc-odr1-64    207.00 (  0.00%)           203.00 (  1.93%)
  Min      alloc-odr1-128   276.00 (  0.00%)           202.00 ( 26.81%)
  Min      alloc-odr1-256   206.00 (  0.00%)           202.00 (  1.94%)
  Min      alloc-odr1-512   207.00 (  0.00%)           202.00 (  2.42%)
  Min      alloc-odr1-1024  208.00 (  0.00%)           205.00 (  1.44%)
  Min      alloc-odr1-2048  213.00 (  0.00%)           212.00 (  0.47%)
  Min      alloc-odr1-4096  218.00 (  0.00%)           216.00 (  0.92%)
  Min      alloc-odr1-8192  341.00 (  0.00%)           219.00 ( 35.78%)

Note that order-0 allocations are unaffected but higher orders get a
small boost from this patch and a large reduction in system CPU usage
overall as can be seen here:

             4.7.0-rc1   4.7.0-rc1
               vanilla  reset-v1r2
  User           85.32       86.31
  System       2221.39     2053.36
  Elapsed      2368.89     2202.47

Fixes: c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Link: http://lkml.kernel.org/r/20160531100848.GR2527@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 16:02:56 -07:00
Michal Hocko
cbdcf7f789 mm, oom_reaper: do not use siglock in try_oom_reaper()
Oleg has noted that siglock usage in try_oom_reaper is both pointless
and dangerous.  signal_group_exit can be checked lockless.  The problem
is that sighand becomes NULL in __exit_signal so we can crash.

Fixes: 3ef22dfff2 ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path")
Link: http://lkml.kernel.org/r/1464679423-30218-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 16:02:56 -07:00
Vlastimil Babka
83b9355bf6 mm, page_alloc: prevent infinite loop in buffered_rmqueue()
In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
buffered_rmqueue() when check_new_pcp() returns 1, because the bad page
is never removed from the pcp list.  Fix this by removing the page
before retrying.  Also we don't need to check if page is non-NULL,
because we simply grab it from the list which was just tested for being
non-empty.

Fixes: 479f854a20 ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
Link: http://lkml.kernel.org/r/20160530090154.GM2527@techsingularity.net
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 16:02:56 -07:00
Joe Perches
879be4f378 checkpatch: reduce git commit description style false positives
Some lines in a commit log appear to be commit SHA1 ids like:

  ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0123456789ab ("commit description")'
  Link: http://lkml.kernel.org/r/40e03fd7aaf1f55c75d787128d6d17c5a71226c2.1464358556.git.vdavydov@virtuozzo.com

Reduce the false positives.

Link: http://lkml.kernel.org/r/eda977eaa8328fef42bb3c87935d97e10ea8ff67.1464384023.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 16:02:56 -07:00
Vitaly Wool
43afc19417 mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup
Fix erroneous z3fold header access in a HEADLESS page in reclaim
function, and change one remaining direct handle-to-buddy conversion to
use the appropriate helper.

Link: http://lkml.kernel.org/r/5748706F.9020208@gmail.com
Signed-off-by: Vitaly Wool <vitalywool@gmail.com>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 16:02:55 -07:00
Tejun Heo
3a06bb78ce memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
memcg_offline_kmem() may be called from memcg_free_kmem() after a css
init failure.  memcg_free_kmem() is a ->css_free callback which is
called without cgroup_mutex and memcg_offline_kmem() ends up using
css_for_each_descendant_pre() without any locking.  Fix it by adding rcu
read locking around it.

    mkdir: cannot create directory `65530': No space left on device
    ===============================
    [ INFO: suspicious RCU usage. ]
    4.6.0-work+ #321 Not tainted
    -------------------------------
    kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required!
     [  527.243970] other info that might help us debug this:
     [  527.244715]
    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by kworker/0:5/1664:
     #0:  ("cgroup_destroy"){.+.+..}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
     #1:  ((&css->destroy_work)#3){+.+...}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
     [  527.248098] stack backtrace:
    CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
    Workqueue: cgroup_destroy css_free_work_fn
    Call Trace:
      dump_stack+0x68/0xa1
      lockdep_rcu_suspicious+0xd7/0x110
      css_next_descendant_pre+0x7d/0xb0
      memcg_offline_kmem.part.44+0x4a/0xc0
      mem_cgroup_css_free+0x1ec/0x200
      css_free_work_fn+0x49/0x5e0
      process_one_work+0x1c5/0x4a0
      worker_thread+0x49/0x490
      kthread+0xea/0x100
      ret_from_fork+0x1f/0x40

Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>	[4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 16:02:55 -07:00
Linus Torvalds
2c22132563 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer bugfix from Thomas Gleixner:
 "A single bugfix for the error check wreckage we introduced in the
  merge window"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Make settimeofday error checking work again
2016-06-03 15:37:27 -07:00
Yang Shi
f86e427197 mm: check the return value of lookup_page_ext for all call sites
Per the discussion with Joonsoo Kim [1], we need check the return value
of lookup_page_ext() for all call sites since it might return NULL in
some cases, although it is unlikely, i.e.  memory hotplug.

Tested with ltp with "page_owner=0".

[1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE

[akpm@linux-foundation.org: fix build-breaking typos]
[arnd@arndb.de: fix build problems from lookup_page_ext]
  Link: http://lkml.kernel.org/r/6285269.2CksypHdYp@wuerfel
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1464023768-31025-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 15:06:22 -07:00
Corey Minyard
d8bae33ddd kdump: fix dmesg gdbmacro to work with record based printk
Commit 7ff9554bb5 ("printk: convert byte-buffer to variable-length
record buffer") introduced a record based printk buffer.  Modify
gdbmacros.txt to parse this new structure so dmesg will work properly.

Link: http://lkml.kernel.org/r/1463515794-1599-1-git-send-email-minyard@acm.org
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 15:06:22 -07:00
Guillermo Julián Moreno
65ee03c4b9 mm: fix overflow in vm_map_ram()
When remapping pages accounting for 4G or more memory space, the
operation 'count << PAGE_SHIFT' overflows as it is performed on an
integer.  Solution: cast before doing the bitshift.

[akpm@linux-foundation.org: fix vm_unmap_ram() also]
[akpm@linux-foundation.org: fix vmap() as well, per Guillermo]
Link: http://lkml.kernel.org/r/etPan.57175fb3.7a271c6b.2bd@naudit.es
Signed-off-by: Guillermo Julián Moreno <guillermo.julian@naudit.es>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-03 15:06:22 -07:00
Linus Torvalds
e603330c86 Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
 "Just one fix to the ptrace code, spotted by Simon Marchi, where if a
  thread migrates to a different CPU and the VFP registers are changed
  through ptrace, the application doesn't see the updated VFP registers"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: fix PTRACE_SETVFPREGS on SMP systems
2016-06-03 14:39:29 -07:00
Linus Torvalds
d29e472301 arm64 fixes:
- Revert a previous revert and get hugetlb going with contiguous hints
 - Wire up missing compat syscalls
 - Enable CONFIG_SET_MODULE_RONX by default
 - Add missing line to our compat /proc/cpuinfo output
 - Clarify levels in our page table dumps
 - Fix booting with RANDOMIZE_TEXT_OFFSET enabled
 - Misc fixes to the ARM CPU PMU driver (refcounting, probe failure)
 - Remove some dead code and update a comment
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJXUZzdAAoJELescNyEwWM0Fm4H/215TJBXcohdIpTYX/Nr3+Bh
 5jOqMgEWKe6ipmXCnzWpZ1F1GHoxLxndSWhJbi1Gwpdc+EYdXWxT5Z/IsQMKZXOb
 n81RzJbit6hXlebZ99YbEjfokfpMATno/9pD8sr1F2Xg/LsrD00O/0aPk9Fp38vX
 VRK2TlaVdBQbF80rMI6fUEEVksbJBqS+jBlqwkRri9JPusP+F5F4Qg8f4VqXDSef
 ZDEBzaI1BjrIQhDGSf0baXQ9EvuyotT2VR1NKAbYzdK8JoLTYiacIDviU2TCsuYZ
 ujB3MTOUytxAJbFhC2KBiDJfukSNY8vg+fLFg7rM3qUj6TMw6ESe1goYVScljHY=
 =cAlD
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "The main thing here is reviving hugetlb support using contiguous ptes,
  which we ended up reverting at the last minute in 4.5 pending a fix
  which went into the core mm/ code during the recent merge window.

   - Revert a previous revert and get hugetlb going with contiguous hints
   - Wire up missing compat syscalls
   - Enable CONFIG_SET_MODULE_RONX by default
   - Add missing line to our compat /proc/cpuinfo output
   - Clarify levels in our page table dumps
   - Fix booting with RANDOMIZE_TEXT_OFFSET enabled
   - Misc fixes to the ARM CPU PMU driver (refcounting, probe failure)
   - Remove some dead code and update a comment"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled
  arm64: move {PAGE,CONT}_SHIFT into Kconfig
  arm64: mm: dump: log span level
  arm64: update stale PAGE_OFFSET comment
  drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error
  drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu
  drivers/perf: arm_pmu: Fix reference count of a device_node in of_pmu_irq_cfg
  arm64: report CPU number in bad_mode
  arm64: unistd32.h: wire up missing syscalls for compat tasks
  arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks
  arm64: enable CONFIG_SET_MODULE_RONX by default
  arm64: Remove orphaned __addr_ok() definition
  Revert "arm64: hugetlb: partial revert of 66b3923a1a0f"
2016-06-03 14:29:47 -07:00
Linus Torvalds
5306d766f1 powerpc fixes for 4.7
- Handle RTAS delay requests in configure_bridge from Russell Currey
  - Refactor the configure_bridge RTAS tokens from Russell Currey
  - Fix definition of SIAR and SDAR registers from Thomas Huth
  - Use privileged SPR number for MMCR2 from Thomas Huth
  - Update LPCR only if it is powernv from Aneesh Kumar K.V
  - Fix the reference bit update when handling hash fault from Aneesh Kumar K.V
  - Add missing tlb flush from Aneesh Kumar K.V
  - Add POWER8NVL support to ibm,client-architecture-support call from Thomas Huth
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXUWmzAAoJEFHr6jzI4aWAuEAQAKAWjT4JCIQMzc2Qk0LbBFhD
 Mw3xIt618g0PLRLN431rqk9xE1phh9ecSNt40kQV0UYypCwQyYaOzAAHq7QibRFh
 wFDLyrO10aiqTVBI9Dl7l6ZDespGP4ufOyj4GF3vIndTQ8CLT6+Jj/cfaT8lNIna
 vOp0Q8RKoSuqJu7wOZwxGHc9vhdpDftBOi2Yl2KHWrWUMTVtyDzgRwmFiCk/9SvX
 +xYVGfbCRkU1b05X9RgbsbMdiAzC7odzAC8e4VNHqB+YHgBFenlOKcx6uuz5Oubv
 /0dGJq1OEKxKgkEWZsXMhfcezI1i6H1qa5gwqO2pJn/mQiWUx/IxxC5j1tGMcgXJ
 tU4cJtMccUmB22bOTqJsGhQJyHh09UDql2PIAHdvTZplYBj34R+2fr6LMRC5TUyY
 yXsk/26CAh1ihPK8juytK8D2f7CfFJVe96HXMz8yGWNy2MeaJHh3Nz7E2/hA6L6B
 GzT08EBGmGU/1/L4xJi1uuRLQXMgvWkuo0C65m9jqFah7y82gHetypTBSbLOBcUX
 +pJRcWouZlksbEHydMlbvNrqmJz1zPx9JJFJJrfDzkYUB/O61NxOpaQPZHfhMt01
 Bd8n0wt7OPW9bZL9WrW5W11OhOxNZfuxcr5GmVnzsaALPazFThQm1gZH2NOp6Wr/
 9aWy/lZAuDOrtOnmfcZm
 =OkUs
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 - Handle RTAS delay requests in configure_bridge from Russell Currey
 - Refactor the configure_bridge RTAS tokens from Russell Currey
 - Fix definition of SIAR and SDAR registers from Thomas Huth
 - Use privileged SPR number for MMCR2 from Thomas Huth
 - Update LPCR only if it is powernv from Aneesh Kumar K.V
 - Fix the reference bit update when handling hash fault from Aneesh
   Kumar K.V
 - Add missing tlb flush from Aneesh Kumar K.V
 - Add POWER8NVL support to ibm,client-architecture-support call from
   Thomas Huth

* tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call
  powerpc/mm/radix: Add missing tlb flush
  powerpc/mm/hash: Fix the reference bit update when handling hash fault
  powerpc/mm/radix: Update LPCR only if it is powernv
  powerpc: Use privileged SPR number for MMCR2
  powerpc: Fix definition of SIAR and SDAR registers
  powerpc/pseries/eeh: Refactor the configure_bridge RTAS tokens
  powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge
2016-06-03 14:20:22 -07:00