From 01a36b6758e723f919420ef20cea5fca1fc06e2b Mon Sep 17 00:00:00 2001
From: Gang He <ghe@suse.com>
Date: Tue, 2 Aug 2016 14:02:07 -0700
Subject: [PATCH 001/111] ocfs2: ensure that dlm lockspace is created by kernel
 module

We encountered a bug from the customer, the user did a fsck.ocfs2 on the
file system and exited unusually, the lockspace (with LVB size = 32) was
left in the kernel space, next, the user mounted this file system, the
kernel module did not create a new lockspace (LVB size = 64) via calling
dlm_new_lockspace() function in mounting stage, just used the existing
lockspace, created by the user space tool, this would lead the user was
not able to mount this file system from the other nodes, with the error
message like:

  dlm: 032F5......: config mismatch: 64,0 nodeid 177127961: 32,0
  (mount.ocfs2,26981,46):ocfs2_dlm_init:2995 ERROR: status = -71
  ocfs2_mount_volume:1881 ERROR: status = -71
  ocfs2_fill_super:1236 ERROR: status = -71

The user found it very difficult to find the root cause, then, we
brought out this patch to relieve such problem.

First, we add one more flag in calling dlm_new_lockspace() function, to
make sure the lockspace is created by kernel module itself, and this
change will not affect the backward compatibility.

Second, the obvious error message is reported in the kernel log, let the
user be more easy to find the root cause.

This patch will be used to insure the dlm lockspace is created by kernel
module when mounting a ocfs2 file system.  There are two ways to create
a lockspace, from user space and kernel space, but the same name
lockspaces probably have different lvblen lengths/flags.

To avoid this mix using, we add one more flag DLM_LSFL_NEWEXCL, it will
make sure the dlm lockspace is created by kernel module when mounting.
Secondly, if a user space program (ocfs2-tools) is running on a file
system, the user tries to mount this file system in the cluster, DLM
module will return a -EEXIST or -EPROTO errno, we should give the user a
obvious error message, then, the user can let that user space tool exit
before mounting the file system again.

Link: http://lkml.kernel.org/r/1463731940-13044-2-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/ocfs2/stack_user.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index ced70c8139f7..c9e828ec3c8e 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -1007,10 +1007,17 @@ static int user_cluster_connect(struct ocfs2_cluster_connection *conn)
 	lc->oc_type = NO_CONTROLD;
 
 	rc = dlm_new_lockspace(conn->cc_name, conn->cc_cluster_name,
-			       DLM_LSFL_FS, DLM_LVB_LEN,
+			       DLM_LSFL_FS | DLM_LSFL_NEWEXCL, DLM_LVB_LEN,
 			       &ocfs2_ls_ops, conn, &ops_rv, &fsdlm);
-	if (rc)
+	if (rc) {
+		if (rc == -EEXIST || rc == -EPROTO)
+			printk(KERN_ERR "ocfs2: Unable to create the "
+				"lockspace %s (%d), because a ocfs2-tools "
+				"program is running on this file system "
+				"with the same name lockspace\n",
+				conn->cc_name, rc);
 		goto out;
+	}
 
 	if (ops_rv == -EOPNOTSUPP) {
 		lc->oc_type = WITH_CONTROLD;

From 2070ad1aebfff2c26190188844c38e55d2df2ae2 Mon Sep 17 00:00:00 2001
From: Eric Ren <zren@suse.com>
Date: Tue, 2 Aug 2016 14:02:10 -0700
Subject: [PATCH 002/111] ocfs2: retry on ENOSPC if sufficient space in
 truncate log

The testcase "mmaptruncate" in ocfs2 test suite always fails with ENOSPC
error on small volume (say less than 10G).  This testcase repeatedly
performs "extend" and "truncate" on a file.  Continuously, it truncates
the file to 1/2 of the size, and then extends to 100% of the size.  The
main bitmap will quickly run out of space because the "truncate" code
prevent truncate log from being flushed by
ocfs2_schedule_truncate_log_flush(osb, 1), while truncate log may have
cached lots of clusters.

So retry to allocate after flushing truncate log when ENOSPC is
returned.  And we cannot reuse the deleted blocks before the transaction
committed.  Fortunately, we already have a function to do this -
ocfs2_try_to_free_truncate_log().  Just need to remove the "static"
modifier and put it into the right place.

The "unlock"/"lock" code isn't elegant, but there seems to be no better
option.

[zren@suse.com: locking fix]
  Link: http://lkml.kernel.org/r/1468031546-4797-1-git-send-email-zren@suse.com
Link: http://lkml.kernel.org/r/1466586469-5541-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Gang He <ghe@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/ocfs2/alloc.c    | 37 +++++++++++++++++++++++++++++++++++++
 fs/ocfs2/alloc.h    |  2 ++
 fs/ocfs2/aops.c     | 37 -------------------------------------
 fs/ocfs2/suballoc.c | 20 +++++++++++++++++++-
 4 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 460c0cedab3a..7dabbc31060e 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -6106,6 +6106,43 @@ void ocfs2_schedule_truncate_log_flush(struct ocfs2_super *osb,
 	}
 }
 
+/*
+ * Try to flush truncate logs if we can free enough clusters from it.
+ * As for return value, "< 0" means error, "0" no space and "1" means
+ * we have freed enough spaces and let the caller try to allocate again.
+ */
+int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
+					unsigned int needed)
+{
+	tid_t target;
+	int ret = 0;
+	unsigned int truncated_clusters;
+
+	inode_lock(osb->osb_tl_inode);
+	truncated_clusters = osb->truncated_clusters;
+	inode_unlock(osb->osb_tl_inode);
+
+	/*
+	 * Check whether we can succeed in allocating if we free
+	 * the truncate log.
+	 */
+	if (truncated_clusters < needed)
+		goto out;
+
+	ret = ocfs2_flush_truncate_log(osb);
+	if (ret) {
+		mlog_errno(ret);
+		goto out;
+	}
+
+	if (jbd2_journal_start_commit(osb->journal->j_journal, &target)) {
+		jbd2_log_wait_commit(osb->journal->j_journal, target);
+		ret = 1;
+	}
+out:
+	return ret;
+}
+
 static int ocfs2_get_truncate_log_info(struct ocfs2_super *osb,
 				       int slot_num,
 				       struct inode **tl_inode,
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index f3dc1b0dfffc..4a5152ec88a3 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -188,6 +188,8 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
 			      u64 start_blk,
 			      unsigned int num_clusters);
 int __ocfs2_flush_truncate_log(struct ocfs2_super *osb);
+int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
+				   unsigned int needed);
 
 /*
  * Process local structure which describes the block unlinks done
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index af2adfcb0f6f..98d36548153d 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1645,43 +1645,6 @@ static int ocfs2_zero_tail(struct inode *inode, struct buffer_head *di_bh,
 	return ret;
 }
 
-/*
- * Try to flush truncate logs if we can free enough clusters from it.
- * As for return value, "< 0" means error, "0" no space and "1" means
- * we have freed enough spaces and let the caller try to allocate again.
- */
-static int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
-					  unsigned int needed)
-{
-	tid_t target;
-	int ret = 0;
-	unsigned int truncated_clusters;
-
-	inode_lock(osb->osb_tl_inode);
-	truncated_clusters = osb->truncated_clusters;
-	inode_unlock(osb->osb_tl_inode);
-
-	/*
-	 * Check whether we can succeed in allocating if we free
-	 * the truncate log.
-	 */
-	if (truncated_clusters < needed)
-		goto out;
-
-	ret = ocfs2_flush_truncate_log(osb);
-	if (ret) {
-		mlog_errno(ret);
-		goto out;
-	}
-
-	if (jbd2_journal_start_commit(osb->journal->j_journal, &target)) {
-		jbd2_log_wait_commit(osb->journal->j_journal, target);
-		ret = 1;
-	}
-out:
-	return ret;
-}
-
 int ocfs2_write_begin_nolock(struct address_space *mapping,
 			     loff_t pos, unsigned len, ocfs2_write_type_t type,
 			     struct page **pagep, void **fsdata,
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 2f19aeec5482..ea47120a85ff 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1164,7 +1164,8 @@ static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb,
 					     int flags,
 					     struct ocfs2_alloc_context **ac)
 {
-	int status;
+	int status, ret = 0;
+	int retried = 0;
 
 	*ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL);
 	if (!(*ac)) {
@@ -1189,7 +1190,24 @@ static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb,
 	}
 
 	if (status == -ENOSPC) {
+retry:
 		status = ocfs2_reserve_cluster_bitmap_bits(osb, *ac);
+		/* Retry if there is sufficient space cached in truncate log */
+		if (status == -ENOSPC && !retried) {
+			retried = 1;
+			ocfs2_inode_unlock((*ac)->ac_inode, 1);
+			inode_unlock((*ac)->ac_inode);
+
+			ret = ocfs2_try_to_free_truncate_log(osb, bits_wanted);
+			if (ret == 1)
+				goto retry;
+
+			if (ret < 0)
+				mlog_errno(ret);
+
+			inode_lock((*ac)->ac_inode);
+			ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
+		}
 		if (status < 0) {
 			if (status != -ENOSPC)
 				mlog_errno(status);

From 86b652b93adb57d8fed8edd532ed2eb8a791950d Mon Sep 17 00:00:00 2001
From: piaojun <piaojun@huawei.com>
Date: Tue, 2 Aug 2016 14:02:13 -0700
Subject: [PATCH 003/111] ocfs2/dlm: disable BUG_ON when
 DLM_LOCK_RES_DROPPING_REF is cleared before dlm_deref_lockres_done_handler

We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
unexpected that described below.  To solve the bug, we disable the
BUG_ON and purge lockres in dlm_do_local_recovery_cleanup.

Node 1                               Node 2(master)
dlm_purge_lockres
                                     dlm_deref_lockres_handler

                                     DLM_LOCK_RES_SETREF_INPROG is set
                                     response DLM_DEREF_RESPONSE_INPROG

receive DLM_DEREF_RESPONSE_INPROG
stop puring in dlm_purge_lockres
and wait for DLM_DEREF_RESPONSE_DONE

                                     dispatch dlm_deref_lockres_worker
                                     response DLM_DEREF_RESPONSE_DONE

receive DLM_DEREF_RESPONSE_DONE and
prepare to purge lockres

                                     Node 2 goes down

find Node2 down and do local
clean up for Node2:
dlm_do_local_recovery_cleanup
  -> clear DLM_LOCK_RES_DROPPING_REF

when purging lockres, BUG_ON happens
because DLM_LOCK_RES_DROPPING_REF is clear:
dlm_deref_lockres_done_handler
  ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));

[akpm@linux-foundation.org: fix duplicated write to `ret']
Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message")
Link: http://lkml.kernel.org/r/57845055.9080702@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/ocfs2/dlm/dlmmaster.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 13719d3f35f8..525dc06468c4 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -2416,7 +2416,17 @@ int dlm_deref_lockres_done_handler(struct o2net_msg *msg, u32 len, void *data,
 	}
 
 	spin_lock(&res->spinlock);
-	BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
+	if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) {
+		spin_unlock(&res->spinlock);
+		spin_unlock(&dlm->spinlock);
+		mlog(ML_NOTICE, "%s:%.*s: node %u sends deref done "
+			"but it is already derefed!\n", dlm->name,
+			res->lockname.len, res->lockname.name, node);
+		dlm_lockres_put(res);
+		ret = 0;
+		goto done;
+	}
+
 	if (!list_empty(&res->purge)) {
 		mlog(0, "%s: Removing res %.*s from purgelist\n",
 			dlm->name, res->lockname.len, res->lockname.name);
@@ -2456,7 +2466,6 @@ int dlm_deref_lockres_done_handler(struct o2net_msg *msg, u32 len, void *data,
 	spin_unlock(&dlm->spinlock);
 
 	ret = 0;
-
 done:
 	dlm_put(dlm);
 	return ret;

From 309e91911daede6adde0364f489e69909c3f6894 Mon Sep 17 00:00:00 2001
From: piaojun <piaojun@huawei.com>
Date: Tue, 2 Aug 2016 14:02:16 -0700
Subject: [PATCH 004/111] ocfs2/dlm: solve a BUG when deref failed in
 dlm_drop_lockres_ref

We found a BUG situation that lockres is migrated during deref described
below.  To solve the BUG, we could purge lockres directly when other
node says I did not have a ref.  Additionally, we'd better purge lockres
if master goes down, as no one will response deref done.

Node 1                  Node 2(old master)             Node3(new master)
dlm_purge_lockres
send deref to N2

                        leave domain
                        migrate lockres to N3
                                                       finish migration
                                                       send do assert
                                                       master to N1

receive do assert msg
form N3, but can not
find lockres because
DROPPING_REF is set,
so the owner is still
N2.

                        receive deref from N1
                        and response -EINVAL
                        because lockres is migrated

BUG when receive -EINVAL
in dlm_drop_lockres_ref

Fixes: 842b90b62461d ("ocfs2/dlm: return in progress if master can not clear the refmap bit right now")

Link: http://lkml.kernel.org/r/57845103.3070406@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/ocfs2/dlm/dlmmaster.c |  9 ++++++---
 fs/ocfs2/dlm/dlmthread.c | 13 +++++++++++--
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 525dc06468c4..553d220df406 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -2276,9 +2276,12 @@ int dlm_drop_lockres_ref(struct dlm_ctxt *dlm, struct dlm_lock_resource *res)
 		mlog(ML_ERROR, "%s: res %.*s, DEREF to node %u got %d\n",
 		     dlm->name, namelen, lockname, res->owner, r);
 		dlm_print_one_lock_resource(res);
-		BUG();
-	}
-	return ret ? ret : r;
+		if (r == -ENOMEM)
+			BUG();
+	} else
+		ret = r;
+
+	return ret;
 }
 
 int dlm_deref_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index 68d239ba0c63..ce397229acc0 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -175,6 +175,15 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm,
 	     res->lockname.len, res->lockname.name, master);
 
 	if (!master) {
+		if (res->state & DLM_LOCK_RES_DROPPING_REF) {
+			mlog(ML_NOTICE, "%s: res %.*s already in "
+				"DLM_LOCK_RES_DROPPING_REF state\n",
+				dlm->name, res->lockname.len,
+				res->lockname.name);
+			spin_unlock(&res->spinlock);
+			return;
+		}
+
 		res->state |= DLM_LOCK_RES_DROPPING_REF;
 		/* drop spinlock...  retake below */
 		spin_unlock(&res->spinlock);
@@ -203,8 +212,8 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm,
 		dlm->purge_count--;
 	}
 
-	if (!master && ret != 0) {
-		mlog(0, "%s: deref %.*s in progress or master goes down\n",
+	if (!master && ret == DLM_DEREF_RESPONSE_INPROG) {
+		mlog(0, "%s: deref %.*s in progress\n",
 			dlm->name, res->lockname.len, res->lockname.name);
 		spin_unlock(&res->spinlock);
 		return;

From ee8f7fcbe638b07e8d1c3dc98e8be35e56306d05 Mon Sep 17 00:00:00 2001
From: piaojun <piaojun@huawei.com>
Date: Tue, 2 Aug 2016 14:02:19 -0700
Subject: [PATCH 005/111] ocfs2/dlm: continue to purge recovery lockres when
 recovery master goes down

We found a dlm-blocked situation caused by continuous breakdown of
recovery masters described below.  To solve this problem, we should
purge recovery lock once detecting recovery master goes down.

N3                      N2                   N1(reco master)
                        go down
                                             pick up recovery lock and
                                             begin recoverying for N2

                                             go down

pick up recovery
lock failed, then
purge it:
dlm_purge_lockres
  ->DROPPING_REF is set

send deref to N1 failed,
recovery lock is not purged

find N1 go down, begin
recoverying for N1, but
blocked in dlm_do_recovery
as DROPPING_REF is set:
dlm_do_recovery
  ->dlm_pick_recovery_master
    ->dlmlock
      ->dlm_get_lock_resource
        ->__dlm_wait_on_lockres_flags(tmpres,
	  	DLM_LOCK_RES_DROPPING_REF);

Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down")
Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/ocfs2/dlm/dlmcommon.h   |  2 ++
 fs/ocfs2/dlm/dlmmaster.c   | 37 +++------------------------
 fs/ocfs2/dlm/dlmrecovery.c | 29 +++++++++++++++------
 fs/ocfs2/dlm/dlmthread.c   | 52 +++++++++++++++++++++++++++++++++++---
 4 files changed, 74 insertions(+), 46 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index 8107d0d0c3f6..e9f3705c4c9f 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -1004,6 +1004,8 @@ int dlm_finalize_reco_handler(struct o2net_msg *msg, u32 len, void *data,
 int dlm_do_master_requery(struct dlm_ctxt *dlm, struct dlm_lock_resource *res,
 			  u8 nodenum, u8 *real_master);
 
+void __dlm_do_purge_lockres(struct dlm_ctxt *dlm,
+		struct dlm_lock_resource *res);
 
 int dlm_dispatch_assert_master(struct dlm_ctxt *dlm,
 			       struct dlm_lock_resource *res,
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 553d220df406..6ea06f8a7d29 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -2425,51 +2425,20 @@ int dlm_deref_lockres_done_handler(struct o2net_msg *msg, u32 len, void *data,
 		mlog(ML_NOTICE, "%s:%.*s: node %u sends deref done "
 			"but it is already derefed!\n", dlm->name,
 			res->lockname.len, res->lockname.name, node);
-		dlm_lockres_put(res);
 		ret = 0;
 		goto done;
 	}
 
-	if (!list_empty(&res->purge)) {
-		mlog(0, "%s: Removing res %.*s from purgelist\n",
-			dlm->name, res->lockname.len, res->lockname.name);
-		list_del_init(&res->purge);
-		dlm_lockres_put(res);
-		dlm->purge_count--;
-	}
-
-	if (!__dlm_lockres_unused(res)) {
-		mlog(ML_ERROR, "%s: res %.*s in use after deref\n",
-			dlm->name, res->lockname.len, res->lockname.name);
-		__dlm_print_one_lock_resource(res);
-		BUG();
-	}
-
-	__dlm_unhash_lockres(dlm, res);
-
-	spin_lock(&dlm->track_lock);
-	if (!list_empty(&res->tracking))
-		list_del_init(&res->tracking);
-	else {
-		mlog(ML_ERROR, "%s: Resource %.*s not on the Tracking list\n",
-		     dlm->name, res->lockname.len, res->lockname.name);
-		__dlm_print_one_lock_resource(res);
-	}
-	spin_unlock(&dlm->track_lock);
-
-	/* lockres is not in the hash now. drop the flag and wake up
-	 * any processes waiting in dlm_get_lock_resource.
-	 */
-	res->state &= ~DLM_LOCK_RES_DROPPING_REF;
+	__dlm_do_purge_lockres(dlm, res);
 	spin_unlock(&res->spinlock);
 	wake_up(&res->wq);
 
-	dlm_lockres_put(res);
-
 	spin_unlock(&dlm->spinlock);
 
 	ret = 0;
 done:
+	if (res)
+		dlm_lockres_put(res);
 	dlm_put(dlm);
 	return ret;
 }
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index f6b313898763..dd5cb8bcefd1 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2343,6 +2343,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
 	struct dlm_lock_resource *res;
 	int i;
 	struct hlist_head *bucket;
+	struct hlist_node *tmp;
 	struct dlm_lock *lock;
 
 
@@ -2365,7 +2366,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
 	 */
 	for (i = 0; i < DLM_HASH_BUCKETS; i++) {
 		bucket = dlm_lockres_hash(dlm, i);
-		hlist_for_each_entry(res, bucket, hash_node) {
+		hlist_for_each_entry_safe(res, tmp, bucket, hash_node) {
  			/* always prune any $RECOVERY entries for dead nodes,
  			 * otherwise hangs can occur during later recovery */
 			if (dlm_is_recovery_lock(res->lockname.name,
@@ -2386,8 +2387,17 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
 						break;
 					}
 				}
-				dlm_lockres_clear_refmap_bit(dlm, res,
-						dead_node);
+
+				if ((res->owner == dead_node) &&
+							(res->state & DLM_LOCK_RES_DROPPING_REF)) {
+					dlm_lockres_get(res);
+					__dlm_do_purge_lockres(dlm, res);
+					spin_unlock(&res->spinlock);
+					wake_up(&res->wq);
+					dlm_lockres_put(res);
+					continue;
+				} else if (res->owner == dlm->node_num)
+					dlm_lockres_clear_refmap_bit(dlm, res, dead_node);
 				spin_unlock(&res->spinlock);
 				continue;
 			}
@@ -2398,14 +2408,17 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
 				if (res->state & DLM_LOCK_RES_DROPPING_REF) {
 					mlog(0, "%s:%.*s: owned by "
 						"dead node %u, this node was "
-						"dropping its ref when it died. "
-						"continue, dropping the flag.\n",
+						"dropping its ref when master died. "
+						"continue, purging the lockres.\n",
 						dlm->name, res->lockname.len,
 						res->lockname.name, dead_node);
+					dlm_lockres_get(res);
+					__dlm_do_purge_lockres(dlm, res);
+					spin_unlock(&res->spinlock);
+					wake_up(&res->wq);
+					dlm_lockres_put(res);
+					continue;
 				}
-				res->state &= ~DLM_LOCK_RES_DROPPING_REF;
-				dlm_move_lockres_to_recovery_list(dlm,
-						res);
 			} else if (res->owner == dlm->node_num) {
 				dlm_free_dead_locks(dlm, res, dead_node);
 				__dlm_lockres_calc_usage(dlm, res);
diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index ce397229acc0..838a06d4066a 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -160,6 +160,52 @@ void dlm_lockres_calc_usage(struct dlm_ctxt *dlm,
 	spin_unlock(&dlm->spinlock);
 }
 
+/*
+ * Do the real purge work:
+ *     unhash the lockres, and
+ *     clear flag DLM_LOCK_RES_DROPPING_REF.
+ * It requires dlm and lockres spinlock to be taken.
+ */
+void __dlm_do_purge_lockres(struct dlm_ctxt *dlm,
+		struct dlm_lock_resource *res)
+{
+	assert_spin_locked(&dlm->spinlock);
+	assert_spin_locked(&res->spinlock);
+
+	if (!list_empty(&res->purge)) {
+		mlog(0, "%s: Removing res %.*s from purgelist\n",
+		     dlm->name, res->lockname.len, res->lockname.name);
+		list_del_init(&res->purge);
+		dlm_lockres_put(res);
+		dlm->purge_count--;
+	}
+
+	if (!__dlm_lockres_unused(res)) {
+		mlog(ML_ERROR, "%s: res %.*s in use after deref\n",
+		     dlm->name, res->lockname.len, res->lockname.name);
+		__dlm_print_one_lock_resource(res);
+		BUG();
+	}
+
+	__dlm_unhash_lockres(dlm, res);
+
+	spin_lock(&dlm->track_lock);
+	if (!list_empty(&res->tracking))
+		list_del_init(&res->tracking);
+	else {
+		mlog(ML_ERROR, "%s: Resource %.*s not on the Tracking list\n",
+		     dlm->name, res->lockname.len, res->lockname.name);
+		__dlm_print_one_lock_resource(res);
+	}
+	spin_unlock(&dlm->track_lock);
+
+	/*
+	 * lockres is not in the hash now. drop the flag and wake up
+	 * any processes waiting in dlm_get_lock_resource.
+	 */
+	res->state &= ~DLM_LOCK_RES_DROPPING_REF;
+}
+
 static void dlm_purge_lockres(struct dlm_ctxt *dlm,
 			     struct dlm_lock_resource *res)
 {
@@ -176,10 +222,8 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm,
 
 	if (!master) {
 		if (res->state & DLM_LOCK_RES_DROPPING_REF) {
-			mlog(ML_NOTICE, "%s: res %.*s already in "
-				"DLM_LOCK_RES_DROPPING_REF state\n",
-				dlm->name, res->lockname.len,
-				res->lockname.name);
+			mlog(ML_NOTICE, "%s: res %.*s already in DLM_LOCK_RES_DROPPING_REF state\n",
+				dlm->name, res->lockname.len, res->lockname.name);
 			spin_unlock(&res->spinlock);
 			return;
 		}

From c5f88bd29ab42d5d1e77085b5f69d5c6da20324e Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@oracle.com>
Date: Tue, 2 Aug 2016 14:02:22 -0700
Subject: [PATCH 006/111] mm: fail prefaulting if page table allocation fails

I ran into this:

    BUG: sleeping function called from invalid context at mm/page_alloc.c:3784
    in_atomic(): 0, irqs_disabled(): 0, pid: 1434, name: trinity-c1
    2 locks held by trinity-c1/1434:
     #0:  (&mm->mmap_sem){......}, at: [<ffffffff810ce31e>] __do_page_fault+0x1ce/0x8f0
     #1:  (rcu_read_lock){......}, at: [<ffffffff81378f86>] filemap_map_pages+0xd6/0xdd0

    CPU: 0 PID: 1434 Comm: trinity-c1 Not tainted 4.7.0+ #58
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
      dump_stack+0x65/0x84
      panic+0x185/0x2dd
      ___might_sleep+0x51c/0x600
      __might_sleep+0x90/0x1a0
      __alloc_pages_nodemask+0x5b1/0x2160
      alloc_pages_current+0xcc/0x370
      pte_alloc_one+0x12/0x90
      __pte_alloc+0x1d/0x200
      alloc_set_pte+0xe3e/0x14a0
      filemap_map_pages+0x42b/0xdd0
      handle_mm_fault+0x17d5/0x28b0
      __do_page_fault+0x310/0x8f0
      trace_do_page_fault+0x18d/0x310
      do_async_page_fault+0x27/0xa0
      async_page_fault+0x28/0x30

The important bits from the above is that filemap_map_pages() is calling
into the page allocator while holding rcu_read_lock (sleeping is not
allowed inside RCU read-side critical sections).

According to Kirill Shutemov, the prefaulting code in do_fault_around()
is supposed to take care of this, but missing error handling means that
the allocation failure can go unnoticed.

We don't need to return VM_FAULT_OOM (or any other error) here, since we
can just let the normal fault path try again.

Fixes: 7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/1469708107-11868-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 4425b6059339..04004834e985 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3133,6 +3133,8 @@ static int do_fault_around(struct fault_env *fe, pgoff_t start_pgoff)
 
 	if (pmd_none(*fe->pmd)) {
 		fe->prealloc_pte = pte_alloc_one(fe->vma->vm_mm, fe->address);
+		if (!fe->prealloc_pte)
+			goto out;
 		smp_wmb(); /* See comment in __pte_alloc() */
 	}
 

From 1a8018fb4c6976559c3f04bcf760822381be501d Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 2 Aug 2016 14:02:25 -0700
Subject: [PATCH 007/111] mm: move swap-in anonymous page into active list

Every swap-in anonymous page starts from inactive lru list's head.  It
should be activated unconditionally when VM decide to reclaim because
page table entry for the page always usually has marked accessed bit.
Thus, their window size for getting a new referece is 2 * NR_inactive +
NR_active while others is NR_inactive + NR_active.

It's not fair that it has more chance to be referenced compared to other
newly allocated page which starts from active lru list's head.

Johannes:

: The page can still have a valid copy on the swap device, so prefering to
: reclaim that page over a fresh one could make sense.  But as you point
: out, having it start inactive instead of active actually ends up giving it
: *more* LRU time, and that seems to be without justification.

Rik:

: The reason newly read in swap cache pages start on the inactive list is
: that we do some amount of read-around, and do not know which pages will
: get used.
:
: However, immediately activating the ones that DO get used, like your patch
: does, is the right thing to do.

Link: http://lkml.kernel.org/r/1469762740-17860-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memory.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memory.c b/mm/memory.c
index 04004834e985..83be99d9d8a1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2642,6 +2642,7 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 	if (page == swapcache) {
 		do_page_add_anon_rmap(page, vma, fe->address, exclusive);
 		mem_cgroup_commit_charge(page, memcg, true, false);
+		activate_page(page);
 	} else { /* ksm created a completely new copy */
 		page_add_new_anon_rmap(page, vma, fe->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);

From 117dec978cf64e8e96f13d0cf4891ff77c9acf55 Mon Sep 17 00:00:00 2001
From: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Date: Tue, 2 Aug 2016 14:02:28 -0700
Subject: [PATCH 008/111] tools/testing/radix-tree/linux/gfp.h: fix bitrotted
 value

Apparently, the tools/testing version dates to a few flags ago, and
we've sprouted 4 new ones since.  Keep in sync with the value in the
main tree...

Link: http://lkml.kernel.org/r/23400.1469702675@turing-police.cc.vt.edu
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 tools/testing/radix-tree/linux/gfp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/radix-tree/linux/gfp.h b/tools/testing/radix-tree/linux/gfp.h
index 0e37f7a760eb..5201b915f631 100644
--- a/tools/testing/radix-tree/linux/gfp.h
+++ b/tools/testing/radix-tree/linux/gfp.h
@@ -1,7 +1,7 @@
 #ifndef _GFP_H
 #define _GFP_H
 
-#define __GFP_BITS_SHIFT 22
+#define __GFP_BITS_SHIFT 26
 #define __GFP_BITS_MASK ((gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 #define __GFP_WAIT 1
 #define __GFP_ACCOUNT 0

From 649920c6ab93429b94bc7c1aa7c0e8395351be32 Mon Sep 17 00:00:00 2001
From: Jia He <hejianet@gmail.com>
Date: Tue, 2 Aug 2016 14:02:31 -0700
Subject: [PATCH 009/111] mm/hugetlb: avoid soft lockup in set_max_huge_pages()

In powerpc servers with large memory(32TB), we watched several soft
lockups for hugepage under stress tests.

The call traces are as follows:
1.
get_page_from_freelist+0x2d8/0xd50
__alloc_pages_nodemask+0x180/0xc20
alloc_fresh_huge_page+0xb0/0x190
set_max_huge_pages+0x164/0x3b0

2.
prep_new_huge_page+0x5c/0x100
alloc_fresh_huge_page+0xc8/0x190
set_max_huge_pages+0x164/0x3b0

This patch fixes such soft lockups.  It is safe to call cond_resched()
there because it is out of spin_lock/unlock section.

Link: http://lkml.kernel.org/r/1469674442-14848-1-git-send-email-hejianet@gmail.com
Signed-off-by: Jia He <hejianet@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/hugetlb.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f904246a8fd5..619e00d82c5d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2216,6 +2216,10 @@ static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
 		 * and reducing the surplus.
 		 */
 		spin_unlock(&hugetlb_lock);
+
+		/* yield cpu to avoid soft lockup */
+		cond_resched();
+
 		if (hstate_is_gigantic(h))
 			ret = alloc_fresh_gigantic_page(h, nodes_allowed);
 		else

From 4e666314d286765a9e61818b488c7372326654ec Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Tue, 2 Aug 2016 14:02:34 -0700
Subject: [PATCH 010/111] mm, hugetlb: fix huge_pte_alloc BUG_ON

Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he
runs his database load with memory online and offline running in
parallel.  The reason is that huge_pmd_share might detect a shared pmd
which is currently migrated and so it has migration pte which is
!pte_huge.

There doesn't seem to be any easy way to prevent from the race and in
fact seeing the migration swap entry is not harmful.  Both callers of
huge_pte_alloc are prepared to handle them.  copy_hugetlb_page_range
will copy the swap entry and make it COW if needed.  hugetlb_fault will
back off and so the page fault is retries if the page is still under
migration and waits for its completion in hugetlb_fault.

That means that the BUG_ON is wrong and we should update it.  Let's
simply check that all present ptes are pte_huge instead.

Link: http://lkml.kernel.org/r/20160721074340.GA26398@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: zhongjiang <zhongjiang@huawei.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 619e00d82c5d..ef968306fd5b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4310,7 +4310,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 				pte = (pte_t *)pmd_alloc(mm, pud, addr);
 		}
 	}
-	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
+	BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte));
 
 	return pte;
 }

From d6507ff5331c002430cc20ab25922479453baae7 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Tue, 2 Aug 2016 14:02:37 -0700
Subject: [PATCH 011/111] memcg: put soft limit reclaim out of way if the
 excess tree is empty

We've had a report about soft lockups caused by lock bouncing in the
soft reclaim path:

  BUG: soft lockup - CPU#0 stuck for 22s! [kav4proxy-kavic:3128]
  RIP: 0010:[<ffffffff81469798>]  [<ffffffff81469798>] _raw_spin_lock+0x18/0x20
  Call Trace:
    mem_cgroup_soft_limit_reclaim+0x25a/0x280
    shrink_zones+0xed/0x200
    do_try_to_free_pages+0x74/0x320
    try_to_free_pages+0x112/0x180
    __alloc_pages_slowpath+0x3ff/0x820
    __alloc_pages_nodemask+0x1e9/0x200
    alloc_pages_vma+0xe1/0x290
    do_wp_page+0x19f/0x840
    handle_pte_fault+0x1cd/0x230
    do_page_fault+0x1fd/0x4c0
    page_fault+0x25/0x30

There are no memcgs created so there cannot be any in the soft limit
excess obviously:

  [...]
  memory  0       1       1

so all this just seems to be mem_cgroup_largest_soft_limit_node trying
to get spin_lock_irq(&mctz->lock) just to find out that the soft limit
excess tree is empty.  This is just pointless wasting of cycles and
cache line bouncing during heavy parallel reclaim on large machines.
The particular machine wasn't very healthy and most probably suffering
from a memory leak which just caused the memory reclaim to trash
heavily.  But bouncing on the lock certainly didn't help...

Fix this by optimistic lockless check and bail out early if the tree is
empty.  This is theoretically racy but that shouldn't matter all that
much.  First of all soft limit is a best effort feature and it is slowly
getting deprecated and its usage should be really scarce.  Bouncing on a
lock without a good reason is surely much bigger problem, especially on
large CPU machines.

Link: http://lkml.kernel.org/r/1470073277-1056-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/memcontrol.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c265212bec8c..66beca1ad92f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2559,6 +2559,15 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 		return 0;
 
 	mctz = soft_limit_tree_node(pgdat->node_id);
+
+	/*
+	 * Do not even bother to check the largest node if the root
+	 * is empty. Do it lockless to prevent lock bouncing. Races
+	 * are acceptable as soft limit is best effort anyway.
+	 */
+	if (RB_EMPTY_ROOT(&mctz->rb_root))
+		return 0;
+
 	/*
 	 * This loop can run a while, specially if mem_cgroup's continuously
 	 * keep exceeding their soft limit and putting the system under

From 4a3d308d6674fabf213bce9c1a661ef43a85e515 Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date: Tue, 2 Aug 2016 14:02:40 -0700
Subject: [PATCH 012/111] mm/kasan: fix corruptions and false positive reports

Once an object is put into quarantine, we no longer own it, i.e.  object
could leave the quarantine and be reallocated.  So having set_track()
call after the quarantine_put() may corrupt slab objects.

 BUG kmalloc-4096 (Not tainted): Poison overwritten
 -----------------------------------------------------------------------------
 Disabling lock debugging due to kernel taint
 INFO: 0xffff8804540de850-0xffff8804540de857. First byte 0xb5 instead of 0x6b
...
 INFO: Freed in qlist_free_all+0x42/0x100 age=75 cpu=3 pid=24492
  __slab_free+0x1d6/0x2e0
  ___cache_free+0xb6/0xd0
  qlist_free_all+0x83/0x100
  quarantine_reduce+0x177/0x1b0
  kasan_kmalloc+0xf3/0x100
  kasan_slab_alloc+0x12/0x20
  kmem_cache_alloc+0x109/0x3e0
  mmap_region+0x53e/0xe40
  do_mmap+0x70f/0xa50
  vm_mmap_pgoff+0x147/0x1b0
  SyS_mmap_pgoff+0x2c7/0x5b0
  SyS_mmap+0x1b/0x30
  do_syscall_64+0x1a0/0x4e0
  return_from_SYSCALL_64+0x0/0x7a
 INFO: Slab 0xffffea0011503600 objects=7 used=7 fp=0x          (null) flags=0x8000000000004080
 INFO: Object 0xffff8804540de848 @offset=26696 fp=0xffff8804540dc588
 Redzone ffff8804540de840: bb bb bb bb bb bb bb bb                          ........
 Object ffff8804540de848: 6b 6b 6b 6b 6b 6b 6b 6b b5 52 00 00 f2 01 60 cc  kkkkkkkk.R....`.

Similarly, poisoning after the quarantine_put() leads to false positive
use-after-free reports:

 BUG: KASAN: use-after-free in anon_vma_interval_tree_insert+0x304/0x430 at addr ffff880405c540a0
 Read of size 8 by task trinity-c0/3036
 CPU: 0 PID: 3036 Comm: trinity-c0 Not tainted 4.7.0-think+ #9
 Call Trace:
   dump_stack+0x68/0x96
   kasan_report_error+0x222/0x600
   __asan_report_load8_noabort+0x61/0x70
   anon_vma_interval_tree_insert+0x304/0x430
   anon_vma_chain_link+0x91/0xd0
   anon_vma_clone+0x136/0x3f0
   anon_vma_fork+0x81/0x4c0
   copy_process.part.47+0x2c43/0x5b20
   _do_fork+0x16d/0xbd0
   SyS_clone+0x19/0x20
   do_syscall_64+0x1a0/0x4e0
   entry_SYSCALL64_slow_path+0x25/0x25

Fix this by putting an object in the quarantine after all other
operations.

Fixes: 80a9201a5965 ("mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB")
Link: http://lkml.kernel.org/r/1470062715-14077-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/kasan/kasan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index b6f99e81bfeb..3019cecc0833 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -543,9 +543,9 @@ bool kasan_slab_free(struct kmem_cache *cache, void *object)
 		switch (alloc_info->state) {
 		case KASAN_STATE_ALLOC:
 			alloc_info->state = KASAN_STATE_QUARANTINE;
-			quarantine_put(free_info, cache);
 			set_track(&free_info->track, GFP_NOWAIT);
 			kasan_poison_slab_free(cache, object);
+			quarantine_put(free_info, cache);
 			return true;
 		case KASAN_STATE_QUARANTINE:
 		case KASAN_STATE_FREE:

From 4b3ec5a3f4b1d5c9d64b9ab704042400d050d432 Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date: Tue, 2 Aug 2016 14:02:43 -0700
Subject: [PATCH 013/111] mm/kasan: don't reduce quarantine in atomic contexts

Currently we call quarantine_reduce() for ___GFP_KSWAPD_RECLAIM (implied
by __GFP_RECLAIM) allocation.  So, basically we call it on almost every
allocation.  quarantine_reduce() sometimes is heavy operation, and
calling it with disabled interrupts may trigger hard LOCKUP:

 NMI watchdog: Watchdog detected hard LOCKUP on cpu 2irq event stamp: 1411258
 Call Trace:
  <NMI>   dump_stack+0x68/0x96
   watchdog_overflow_callback+0x15b/0x190
   __perf_event_overflow+0x1b1/0x540
   perf_event_overflow+0x14/0x20
   intel_pmu_handle_irq+0x36a/0xad0
   perf_event_nmi_handler+0x2c/0x50
   nmi_handle+0x128/0x480
   default_do_nmi+0xb2/0x210
   do_nmi+0x1aa/0x220
   end_repeat_nmi+0x1a/0x1e
  <<EOE>>   __kernel_text_address+0x86/0xb0
   print_context_stack+0x7b/0x100
   dump_trace+0x12b/0x350
   save_stack_trace+0x2b/0x50
   set_track+0x83/0x140
   free_debug_processing+0x1aa/0x420
   __slab_free+0x1d6/0x2e0
   ___cache_free+0xb6/0xd0
   qlist_free_all+0x83/0x100
   quarantine_reduce+0x177/0x1b0
   kasan_kmalloc+0xf3/0x100

Reduce the quarantine_reduce iff direct reclaim is allowed.

Fixes: 55834c59098d("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/1470062715-14077-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/kasan/kasan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 3019cecc0833..c99ef40ebdfa 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -565,7 +565,7 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
 	unsigned long redzone_start;
 	unsigned long redzone_end;
 
-	if (flags & __GFP_RECLAIM)
+	if (gfpflags_allow_blocking(flags))
 		quarantine_reduce();
 
 	if (unlikely(object == NULL))
@@ -596,7 +596,7 @@ void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
 	unsigned long redzone_start;
 	unsigned long redzone_end;
 
-	if (flags & __GFP_RECLAIM)
+	if (gfpflags_allow_blocking(flags))
 		quarantine_reduce();
 
 	if (unlikely(ptr == NULL))

From f7376aed6c032aab820fa36806a89e16e353a0d9 Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date: Tue, 2 Aug 2016 14:02:46 -0700
Subject: [PATCH 014/111] mm/kasan, slub: don't disable interrupts when object
 leaves quarantine

SLUB doesn't require disabled interrupts to call ___cache_free().

Link: http://lkml.kernel.org/r/1470062715-14077-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/kasan/quarantine.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 65793f150d1f..4852625ff851 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -147,10 +147,14 @@ static void qlink_free(struct qlist_node *qlink, struct kmem_cache *cache)
 	struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object);
 	unsigned long flags;
 
-	local_irq_save(flags);
+	if (IS_ENABLED(CONFIG_SLAB))
+		local_irq_save(flags);
+
 	alloc_info->state = KASAN_STATE_FREE;
 	___cache_free(cache, object, _THIS_IP_);
-	local_irq_restore(flags);
+
+	if (IS_ENABLED(CONFIG_SLAB))
+		local_irq_restore(flags);
 }
 
 static void qlist_free_all(struct qlist_head *q, struct kmem_cache *cache)

From 47b5c2a0f021e90a79845d1a1353780e5edd0bce Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date: Tue, 2 Aug 2016 14:02:49 -0700
Subject: [PATCH 015/111] mm/kasan: get rid of ->alloc_size in struct
 kasan_alloc_meta

Size of slab object already stored in cache->object_size.

Note, that kmalloc() internally rounds up size of allocation, so
object_size may be not equal to alloc_size, but, usually we don't need
to know the exact size of allocated object.  In case if we need that
information, we still can figure it out from the report.  The dump of
shadow memory allows to identify the end of allocated memory, and
thereby the exact allocation size.

Link: http://lkml.kernel.org/r/1470062715-14077-4-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/kasan/kasan.c  | 1 -
 mm/kasan/kasan.h  | 3 +--
 mm/kasan/report.c | 8 +++-----
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index c99ef40ebdfa..388e812ccaca 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -584,7 +584,6 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
 			get_alloc_info(cache, object);
 
 		alloc_info->state = KASAN_STATE_ALLOC;
-		alloc_info->alloc_size = size;
 		set_track(&alloc_info->track, flags);
 	}
 }
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 31972cdba433..aa175460c8f9 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -75,8 +75,7 @@ struct kasan_track {
 
 struct kasan_alloc_meta {
 	struct kasan_track track;
-	u32 state : 2;	/* enum kasan_state */
-	u32 alloc_size : 30;
+	u32 state;
 };
 
 struct qlist_node {
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 861b9776841a..d67a7e020905 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -136,7 +136,9 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page,
 	struct kasan_free_meta *free_info;
 
 	dump_stack();
-	pr_err("Object at %p, in cache %s\n", object, cache->name);
+	pr_err("Object at %p, in cache %s size: %d\n", object, cache->name,
+		cache->object_size);
+
 	if (!(cache->flags & SLAB_KASAN))
 		return;
 	switch (alloc_info->state) {
@@ -144,15 +146,11 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page,
 		pr_err("Object not allocated yet\n");
 		break;
 	case KASAN_STATE_ALLOC:
-		pr_err("Object allocated with size %u bytes.\n",
-		       alloc_info->alloc_size);
 		pr_err("Allocation:\n");
 		print_track(&alloc_info->track);
 		break;
 	case KASAN_STATE_FREE:
 	case KASAN_STATE_QUARANTINE:
-		pr_err("Object freed, allocated with size %u bytes\n",
-		       alloc_info->alloc_size);
 		free_info = get_free_info(cache, object);
 		pr_err("Allocation:\n");
 		print_track(&alloc_info->track);

From b3cbd9bf77cd1888114dbee1653e79aa23fd4068 Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date: Tue, 2 Aug 2016 14:02:52 -0700
Subject: [PATCH 016/111] mm/kasan: get rid of ->state in struct
 kasan_alloc_meta

The state of object currently tracked in two places - shadow memory, and
the ->state field in struct kasan_alloc_meta.  We can get rid of the
latter.  The will save us a little bit of memory.  Also, this allow us
to move free stack into struct kasan_alloc_meta, without increasing
memory consumption.  So now we should always know when the last time the
object was freed.  This may be useful for long delayed use-after-free
bugs.

As a side effect this fixes following UBSAN warning:
	UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
	member access within misaligned address ffff88000d1efebc for type 'struct qlist_node'
	which requires 8 byte alignment

Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/kasan.h |  3 +++
 mm/kasan/kasan.c      | 63 ++++++++++++++++++++-----------------------
 mm/kasan/kasan.h      | 12 ++-------
 mm/kasan/quarantine.c |  2 --
 mm/kasan/report.c     | 23 ++++------------
 mm/slab.c             |  4 ++-
 mm/slub.c             |  1 +
 7 files changed, 43 insertions(+), 65 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index c9cf374445d8..d600303306eb 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -56,6 +56,7 @@ void kasan_cache_destroy(struct kmem_cache *cache);
 void kasan_poison_slab(struct page *page);
 void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
 void kasan_poison_object_data(struct kmem_cache *cache, void *object);
+void kasan_init_slab_obj(struct kmem_cache *cache, const void *object);
 
 void kasan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
 void kasan_kfree_large(const void *ptr);
@@ -102,6 +103,8 @@ static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
 					void *object) {}
 static inline void kasan_poison_object_data(struct kmem_cache *cache,
 					void *object) {}
+static inline void kasan_init_slab_obj(struct kmem_cache *cache,
+				const void *object) {}
 
 static inline void kasan_kmalloc_large(void *ptr, size_t size, gfp_t flags) {}
 static inline void kasan_kfree_large(const void *ptr) {}
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 388e812ccaca..92750e3b0083 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -442,11 +442,6 @@ void kasan_poison_object_data(struct kmem_cache *cache, void *object)
 	kasan_poison_shadow(object,
 			round_up(cache->object_size, KASAN_SHADOW_SCALE_SIZE),
 			KASAN_KMALLOC_REDZONE);
-	if (cache->flags & SLAB_KASAN) {
-		struct kasan_alloc_meta *alloc_info =
-			get_alloc_info(cache, object);
-		alloc_info->state = KASAN_STATE_INIT;
-	}
 }
 
 static inline int in_irqentry_text(unsigned long ptr)
@@ -510,6 +505,17 @@ struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
 	return (void *)object + cache->kasan_info.free_meta_offset;
 }
 
+void kasan_init_slab_obj(struct kmem_cache *cache, const void *object)
+{
+	struct kasan_alloc_meta *alloc_info;
+
+	if (!(cache->flags & SLAB_KASAN))
+		return;
+
+	alloc_info = get_alloc_info(cache, object);
+	__memset(alloc_info, 0, sizeof(*alloc_info));
+}
+
 void kasan_slab_alloc(struct kmem_cache *cache, void *object, gfp_t flags)
 {
 	kasan_kmalloc(cache, object, cache->object_size, flags);
@@ -529,34 +535,27 @@ static void kasan_poison_slab_free(struct kmem_cache *cache, void *object)
 
 bool kasan_slab_free(struct kmem_cache *cache, void *object)
 {
+	s8 shadow_byte;
+
 	/* RCU slabs could be legally used after free within the RCU period */
 	if (unlikely(cache->flags & SLAB_DESTROY_BY_RCU))
 		return false;
 
-	if (likely(cache->flags & SLAB_KASAN)) {
-		struct kasan_alloc_meta *alloc_info;
-		struct kasan_free_meta *free_info;
-
-		alloc_info = get_alloc_info(cache, object);
-		free_info = get_free_info(cache, object);
-
-		switch (alloc_info->state) {
-		case KASAN_STATE_ALLOC:
-			alloc_info->state = KASAN_STATE_QUARANTINE;
-			set_track(&free_info->track, GFP_NOWAIT);
-			kasan_poison_slab_free(cache, object);
-			quarantine_put(free_info, cache);
-			return true;
-		case KASAN_STATE_QUARANTINE:
-		case KASAN_STATE_FREE:
-			pr_err("Double free");
-			dump_stack();
-			break;
-		default:
-			break;
-		}
+	shadow_byte = READ_ONCE(*(s8 *)kasan_mem_to_shadow(object));
+	if (shadow_byte < 0 || shadow_byte >= KASAN_SHADOW_SCALE_SIZE) {
+		pr_err("Double free");
+		dump_stack();
+		return true;
 	}
-	return false;
+
+	kasan_poison_slab_free(cache, object);
+
+	if (unlikely(!(cache->flags & SLAB_KASAN)))
+		return false;
+
+	set_track(&get_alloc_info(cache, object)->free_track, GFP_NOWAIT);
+	quarantine_put(get_free_info(cache, object), cache);
+	return true;
 }
 
 void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
@@ -579,13 +578,9 @@ void kasan_kmalloc(struct kmem_cache *cache, const void *object, size_t size,
 	kasan_unpoison_shadow(object, size);
 	kasan_poison_shadow((void *)redzone_start, redzone_end - redzone_start,
 		KASAN_KMALLOC_REDZONE);
-	if (cache->flags & SLAB_KASAN) {
-		struct kasan_alloc_meta *alloc_info =
-			get_alloc_info(cache, object);
 
-		alloc_info->state = KASAN_STATE_ALLOC;
-		set_track(&alloc_info->track, flags);
-	}
+	if (cache->flags & SLAB_KASAN)
+		set_track(&get_alloc_info(cache, object)->alloc_track, flags);
 }
 EXPORT_SYMBOL(kasan_kmalloc);
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index aa175460c8f9..9b7b31e25fd2 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -59,13 +59,6 @@ struct kasan_global {
  * Structures to keep alloc and free tracks *
  */
 
-enum kasan_state {
-	KASAN_STATE_INIT,
-	KASAN_STATE_ALLOC,
-	KASAN_STATE_QUARANTINE,
-	KASAN_STATE_FREE
-};
-
 #define KASAN_STACK_DEPTH 64
 
 struct kasan_track {
@@ -74,8 +67,8 @@ struct kasan_track {
 };
 
 struct kasan_alloc_meta {
-	struct kasan_track track;
-	u32 state;
+	struct kasan_track alloc_track;
+	struct kasan_track free_track;
 };
 
 struct qlist_node {
@@ -86,7 +79,6 @@ struct kasan_free_meta {
 	 * Otherwise it might be used for the allocator freelist.
 	 */
 	struct qlist_node quarantine_link;
-	struct kasan_track track;
 };
 
 struct kasan_alloc_meta *get_alloc_info(struct kmem_cache *cache,
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 4852625ff851..7fd121d13b88 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -144,13 +144,11 @@ static void *qlink_to_object(struct qlist_node *qlink, struct kmem_cache *cache)
 static void qlink_free(struct qlist_node *qlink, struct kmem_cache *cache)
 {
 	void *object = qlink_to_object(qlink, cache);
-	struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object);
 	unsigned long flags;
 
 	if (IS_ENABLED(CONFIG_SLAB))
 		local_irq_save(flags);
 
-	alloc_info->state = KASAN_STATE_FREE;
 	___cache_free(cache, object, _THIS_IP_);
 
 	if (IS_ENABLED(CONFIG_SLAB))
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index d67a7e020905..f437398b685a 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -133,7 +133,6 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page,
 				void *object, char *unused_reason)
 {
 	struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object);
-	struct kasan_free_meta *free_info;
 
 	dump_stack();
 	pr_err("Object at %p, in cache %s size: %d\n", object, cache->name,
@@ -141,23 +140,11 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page,
 
 	if (!(cache->flags & SLAB_KASAN))
 		return;
-	switch (alloc_info->state) {
-	case KASAN_STATE_INIT:
-		pr_err("Object not allocated yet\n");
-		break;
-	case KASAN_STATE_ALLOC:
-		pr_err("Allocation:\n");
-		print_track(&alloc_info->track);
-		break;
-	case KASAN_STATE_FREE:
-	case KASAN_STATE_QUARANTINE:
-		free_info = get_free_info(cache, object);
-		pr_err("Allocation:\n");
-		print_track(&alloc_info->track);
-		pr_err("Deallocation:\n");
-		print_track(&free_info->track);
-		break;
-	}
+
+	pr_err("Allocated:\n");
+	print_track(&alloc_info->alloc_track);
+	pr_err("Freed:\n");
+	print_track(&alloc_info->free_track);
 }
 
 static void print_address_description(struct kasan_access_info *info)
diff --git a/mm/slab.c b/mm/slab.c
index 09771ed3e693..ca135bd47c35 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2604,9 +2604,11 @@ static void cache_init_objs(struct kmem_cache *cachep,
 	}
 
 	for (i = 0; i < cachep->num; i++) {
+		objp = index_to_obj(cachep, page, i);
+		kasan_init_slab_obj(cachep, objp);
+
 		/* constructor could break poison info */
 		if (DEBUG == 0 && cachep->ctor) {
-			objp = index_to_obj(cachep, page, i);
 			kasan_unpoison_object_data(cachep, objp);
 			cachep->ctor(objp);
 			kasan_poison_object_data(cachep, objp);
diff --git a/mm/slub.c b/mm/slub.c
index 74e7c8c30db8..26eb6a99540e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1384,6 +1384,7 @@ static void setup_object(struct kmem_cache *s, struct page *page,
 				void *object)
 {
 	setup_object_debug(s, page, object);
+	kasan_init_slab_obj(s, object);
 	if (unlikely(s->ctor)) {
 		kasan_unpoison_object_data(s, object);
 		s->ctor(object);

From 7e088978933ee186533355ae03a9dc1de99cf6c7 Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date: Tue, 2 Aug 2016 14:02:55 -0700
Subject: [PATCH 017/111] kasan: improve double-free reports

Currently we just dump stack in case of double free bug.
Let's dump all info about the object that we have.

[aryabinin@virtuozzo.com: change double free message per Alexander]
  Link: http://lkml.kernel.org/r/1470153654-30160-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1470062715-14077-6-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/kasan/kasan.c  |  3 +--
 mm/kasan/kasan.h  |  2 ++
 mm/kasan/report.c | 54 +++++++++++++++++++++++++++++++++--------------
 3 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 92750e3b0083..88af13c00d3c 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -543,8 +543,7 @@ bool kasan_slab_free(struct kmem_cache *cache, void *object)
 
 	shadow_byte = READ_ONCE(*(s8 *)kasan_mem_to_shadow(object));
 	if (shadow_byte < 0 || shadow_byte >= KASAN_SHADOW_SCALE_SIZE) {
-		pr_err("Double free");
-		dump_stack();
+		kasan_report_double_free(cache, object, shadow_byte);
 		return true;
 	}
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 9b7b31e25fd2..e5c2181fee6f 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -99,6 +99,8 @@ static inline bool kasan_report_enabled(void)
 
 void kasan_report(unsigned long addr, size_t size,
 		bool is_write, unsigned long ip);
+void kasan_report_double_free(struct kmem_cache *cache, void *object,
+			s8 shadow);
 
 #if defined(CONFIG_SLAB) || defined(CONFIG_SLUB)
 void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache);
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index f437398b685a..24c1211fe9d5 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -116,6 +116,26 @@ static inline bool init_task_stack_addr(const void *addr)
 			sizeof(init_thread_union.stack));
 }
 
+static DEFINE_SPINLOCK(report_lock);
+
+static void kasan_start_report(unsigned long *flags)
+{
+	/*
+	 * Make sure we don't end up in loop.
+	 */
+	kasan_disable_current();
+	spin_lock_irqsave(&report_lock, *flags);
+	pr_err("==================================================================\n");
+}
+
+static void kasan_end_report(unsigned long *flags)
+{
+	pr_err("==================================================================\n");
+	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
+	spin_unlock_irqrestore(&report_lock, *flags);
+	kasan_enable_current();
+}
+
 static void print_track(struct kasan_track *track)
 {
 	pr_err("PID = %u\n", track->pid);
@@ -129,8 +149,7 @@ static void print_track(struct kasan_track *track)
 	}
 }
 
-static void kasan_object_err(struct kmem_cache *cache, struct page *page,
-				void *object, char *unused_reason)
+static void kasan_object_err(struct kmem_cache *cache, void *object)
 {
 	struct kasan_alloc_meta *alloc_info = get_alloc_info(cache, object);
 
@@ -147,6 +166,18 @@ static void kasan_object_err(struct kmem_cache *cache, struct page *page,
 	print_track(&alloc_info->free_track);
 }
 
+void kasan_report_double_free(struct kmem_cache *cache, void *object,
+			s8 shadow)
+{
+	unsigned long flags;
+
+	kasan_start_report(&flags);
+	pr_err("BUG: Double free or freeing an invalid pointer\n");
+	pr_err("Unexpected shadow byte: 0x%hhX\n", shadow);
+	kasan_object_err(cache, object);
+	kasan_end_report(&flags);
+}
+
 static void print_address_description(struct kasan_access_info *info)
 {
 	const void *addr = info->access_addr;
@@ -160,8 +191,7 @@ static void print_address_description(struct kasan_access_info *info)
 			struct kmem_cache *cache = page->slab_cache;
 			object = nearest_obj(cache, page,
 						(void *)info->access_addr);
-			kasan_object_err(cache, page, object,
-					"kasan: bad access detected");
+			kasan_object_err(cache, object);
 			return;
 		}
 		dump_page(page, "kasan: bad access detected");
@@ -226,19 +256,13 @@ static void print_shadow_for_address(const void *addr)
 	}
 }
 
-static DEFINE_SPINLOCK(report_lock);
-
 static void kasan_report_error(struct kasan_access_info *info)
 {
 	unsigned long flags;
 	const char *bug_type;
 
-	/*
-	 * Make sure we don't end up in loop.
-	 */
-	kasan_disable_current();
-	spin_lock_irqsave(&report_lock, flags);
-	pr_err("==================================================================\n");
+	kasan_start_report(&flags);
+
 	if (info->access_addr <
 			kasan_shadow_to_mem((void *)KASAN_SHADOW_START)) {
 		if ((unsigned long)info->access_addr < PAGE_SIZE)
@@ -259,10 +283,8 @@ static void kasan_report_error(struct kasan_access_info *info)
 		print_address_description(info);
 		print_shadow_for_address(info->first_bad_addr);
 	}
-	pr_err("==================================================================\n");
-	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
-	spin_unlock_irqrestore(&report_lock, flags);
-	kasan_enable_current();
+
+	kasan_end_report(&flags);
 }
 
 void kasan_report(unsigned long addr, size_t size,

From c3cee372282cb6bcdf19ac1457581d5dd5ecb554 Mon Sep 17 00:00:00 2001
From: Alexander Potapenko <glider@google.com>
Date: Tue, 2 Aug 2016 14:02:58 -0700
Subject: [PATCH 018/111] kasan: avoid overflowing quarantine size on low
 memory systems

If the total amount of memory assigned to quarantine is less than the
amount of memory assigned to per-cpu quarantines, |new_quarantine_size|
may overflow.  Instead, set it to zero.

[akpm@linux-foundation.org: cleanup: use WARN_ONCE return value]
Link: http://lkml.kernel.org/r/1470063563-96266-1-git-send-email-glider@google.com
Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/kasan/quarantine.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 7fd121d13b88..b6728a33a4ac 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -198,7 +198,7 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
 
 void quarantine_reduce(void)
 {
-	size_t new_quarantine_size;
+	size_t new_quarantine_size, percpu_quarantines;
 	unsigned long flags;
 	struct qlist_head to_free = QLIST_INIT;
 	size_t size_to_free = 0;
@@ -216,7 +216,12 @@ void quarantine_reduce(void)
 	 */
 	new_quarantine_size = (READ_ONCE(totalram_pages) << PAGE_SHIFT) /
 		QUARANTINE_FRACTION;
-	new_quarantine_size -= QUARANTINE_PERCPU_SIZE * num_online_cpus();
+	percpu_quarantines = QUARANTINE_PERCPU_SIZE * num_online_cpus();
+	if (WARN_ONCE(new_quarantine_size < percpu_quarantines,
+		"Too little memory, disabling global KASAN quarantine.\n"))
+		new_quarantine_size = 0;
+	else
+		new_quarantine_size -= percpu_quarantines;
 	WRITE_ONCE(quarantine_size, new_quarantine_size);
 
 	last = global_quarantine.head;

From 05eb6e7263185a6bb0de9501ccf2addc52429414 Mon Sep 17 00:00:00 2001
From: Vladimir Davydov <vdavydov@virtuozzo.com>
Date: Tue, 2 Aug 2016 14:03:01 -0700
Subject: [PATCH 019/111] radix-tree: account nodes to memcg only if explicitly
 requested

Radix trees may be used not only for storing page cache pages, so
unconditionally accounting radix tree nodes to the current memory cgroup
is bad: if a radix tree node is used for storing data shared among
different cgroups we risk pinning dead memory cgroups forever.

So let's only account radix tree nodes if it was explicitly requested by
passing __GFP_ACCOUNT to INIT_RADIX_TREE.  Currently, we only want to
account page cache entries, so mark mapping->page_tree so.

Fixes: 58e698af4c63 ("radix-tree: account radix_tree_node to memory cgroup")
Link: http://lkml.kernel.org/r/1470057188-7864-1-git-send-email-vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>	[4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/inode.c       |  2 +-
 lib/radix-tree.c | 14 ++++++++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 9cef4e16aeda..ad445542c285 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -345,7 +345,7 @@ EXPORT_SYMBOL(inc_nlink);
 void address_space_init_once(struct address_space *mapping)
 {
 	memset(mapping, 0, sizeof(*mapping));
-	INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
+	INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC | __GFP_ACCOUNT);
 	spin_lock_init(&mapping->tree_lock);
 	init_rwsem(&mapping->i_mmap_rwsem);
 	INIT_LIST_HEAD(&mapping->private_list);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 61b8fb529cef..1b7bf7314141 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -277,10 +277,11 @@ radix_tree_node_alloc(struct radix_tree_root *root)
 
 		/*
 		 * Even if the caller has preloaded, try to allocate from the
-		 * cache first for the new node to get accounted.
+		 * cache first for the new node to get accounted to the memory
+		 * cgroup.
 		 */
 		ret = kmem_cache_alloc(radix_tree_node_cachep,
-				       gfp_mask | __GFP_ACCOUNT | __GFP_NOWARN);
+				       gfp_mask | __GFP_NOWARN);
 		if (ret)
 			goto out;
 
@@ -303,8 +304,7 @@ radix_tree_node_alloc(struct radix_tree_root *root)
 		kmemleak_update_trace(ret);
 		goto out;
 	}
-	ret = kmem_cache_alloc(radix_tree_node_cachep,
-			       gfp_mask | __GFP_ACCOUNT);
+	ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
 out:
 	BUG_ON(radix_tree_is_internal_node(ret));
 	return ret;
@@ -351,6 +351,12 @@ static int __radix_tree_preload(gfp_t gfp_mask, int nr)
 	struct radix_tree_node *node;
 	int ret = -ENOMEM;
 
+	/*
+	 * Nodes preloaded by one cgroup can be be used by another cgroup, so
+	 * they should never be accounted to any particular memory cgroup.
+	 */
+	gfp_mask &= ~__GFP_ACCOUNT;
+
 	preempt_disable();
 	rtp = this_cpu_ptr(&radix_tree_preloads);
 	while (rtp->nr < nr) {

From b5afba2974f9ebbaaa11b5a633e55db9be3cc363 Mon Sep 17 00:00:00 2001
From: Vladimir Davydov <vdavydov@virtuozzo.com>
Date: Tue, 2 Aug 2016 14:03:04 -0700
Subject: [PATCH 020/111] mm: vmscan: fix memcg-aware shrinkers not called on
 global reclaim

We must call shrink_slab() for each memory cgroup on both global and
memcg reclaim in shrink_node_memcg().  Commit d71df22b55099 accidentally
changed that so that now shrink_slab() is only called with memcg != NULL
on memcg reclaim.  As a result, memcg-aware shrinkers (including
dentry/inode) are never invoked on global reclaim.  Fix that.

Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Link: http://lkml.kernel.org/r/1470056590-7177-1-git-send-email-vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 650d26832569..374d95d04178 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2561,7 +2561,7 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 			shrink_node_memcg(pgdat, memcg, sc, &lru_pages);
 			node_lru_pages += lru_pages;
 
-			if (!global_reclaim(sc))
+			if (memcg)
 				shrink_slab(sc->gfp_mask, pgdat->node_id,
 					    memcg, sc->nr_scanned - scanned,
 					    lru_pages);

From 9b24fef9f0410fb5364245d6cc2bd044cc064007 Mon Sep 17 00:00:00 2001
From: Fabian Frederick <fabf@skynet.be>
Date: Tue, 2 Aug 2016 14:03:07 -0700
Subject: [PATCH 021/111] sysv, ipc: fix security-layer leaking

Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.

Running LTP msgsnd06 with kmemleak gives the following:

  cat /sys/kernel/debug/kmemleak

  unreferenced object 0xffff88003c0a11f8 (size 8):
    comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
    hex dump (first 8 bytes):
      1b 00 00 00 01 00 00 00                          ........
    backtrace:
      kmemleak_alloc+0x23/0x40
      kmem_cache_alloc_trace+0xe1/0x180
      selinux_msg_queue_alloc_security+0x3f/0xd0
      security_msg_queue_alloc+0x2e/0x40
      newque+0x4e/0x150
      ipcget+0x159/0x1b0
      SyS_msgget+0x39/0x40
      entry_SYSCALL_64_fastpath+0x13/0x8f

Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()

Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 ipc/msg.c |  2 +-
 ipc/sem.c | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 1471db9a7e61..c6521c205cb4 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -680,7 +680,7 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
 		rcu_read_lock();
 		ipc_lock_object(&msq->q_perm);
 
-		ipc_rcu_putref(msq, ipc_rcu_free);
+		ipc_rcu_putref(msq, msg_rcu_free);
 		/* raced with RMID? */
 		if (!ipc_valid_object(&msq->q_perm)) {
 			err = -EIDRM;
diff --git a/ipc/sem.c b/ipc/sem.c
index ae72b3cddc8d..7c9d4f7683c0 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -438,7 +438,7 @@ static inline struct sem_array *sem_obtain_object_check(struct ipc_namespace *ns
 static inline void sem_lock_and_putref(struct sem_array *sma)
 {
 	sem_lock(sma, NULL, -1);
-	ipc_rcu_putref(sma, ipc_rcu_free);
+	ipc_rcu_putref(sma, sem_rcu_free);
 }
 
 static inline void sem_rmid(struct ipc_namespace *ns, struct sem_array *s)
@@ -1381,7 +1381,7 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum,
 			rcu_read_unlock();
 			sem_io = ipc_alloc(sizeof(ushort)*nsems);
 			if (sem_io == NULL) {
-				ipc_rcu_putref(sma, ipc_rcu_free);
+				ipc_rcu_putref(sma, sem_rcu_free);
 				return -ENOMEM;
 			}
 
@@ -1415,20 +1415,20 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum,
 		if (nsems > SEMMSL_FAST) {
 			sem_io = ipc_alloc(sizeof(ushort)*nsems);
 			if (sem_io == NULL) {
-				ipc_rcu_putref(sma, ipc_rcu_free);
+				ipc_rcu_putref(sma, sem_rcu_free);
 				return -ENOMEM;
 			}
 		}
 
 		if (copy_from_user(sem_io, p, nsems*sizeof(ushort))) {
-			ipc_rcu_putref(sma, ipc_rcu_free);
+			ipc_rcu_putref(sma, sem_rcu_free);
 			err = -EFAULT;
 			goto out_free;
 		}
 
 		for (i = 0; i < nsems; i++) {
 			if (sem_io[i] > SEMVMX) {
-				ipc_rcu_putref(sma, ipc_rcu_free);
+				ipc_rcu_putref(sma, sem_rcu_free);
 				err = -ERANGE;
 				goto out_free;
 			}
@@ -1720,7 +1720,7 @@ static struct sem_undo *find_alloc_undo(struct ipc_namespace *ns, int semid)
 	/* step 2: allocate new undo structure */
 	new = kzalloc(sizeof(struct sem_undo) + sizeof(short)*nsems, GFP_KERNEL);
 	if (!new) {
-		ipc_rcu_putref(sma, ipc_rcu_free);
+		ipc_rcu_putref(sma, sem_rcu_free);
 		return ERR_PTR(-ENOMEM);
 	}
 

From 901d805c33fc4c029fc6b2d94ee5fb7d30278045 Mon Sep 17 00:00:00 2001
From: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Date: Tue, 2 Aug 2016 14:03:10 -0700
Subject: [PATCH 022/111] UBSAN: fix typo in format string

handle_object_size_mismatch() used %pk to format a kernel pointer with
pr_err().  This seemed to be a misspelling for %pK, but using this to
format a kernel pointer does not make much sence here.

Therefore use %p instead, like in handle_missaligned_access().

Link: http://lkml.kernel.org/r/20160730083010.11569-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 lib/ubsan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/ubsan.c b/lib/ubsan.c
index 8799ae5e2e42..fb0409df1bcf 100644
--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -308,7 +308,7 @@ static void handle_object_size_mismatch(struct type_mismatch_data *data,
 		return;
 
 	ubsan_prologue(&data->location, &flags);
-	pr_err("%s address %pk with insufficient space\n",
+	pr_err("%s address %p with insufficient space\n",
 		type_check_kinds[data->type_check_kind],
 		(void *) ptr);
 	pr_err("for an object of type %s\n", data->type->type_name);

From 9991a9c8dbd2a45b7e09176ff54ffc8c40ae7791 Mon Sep 17 00:00:00 2001
From: "seokhoon.yoon" <iamyooon@gmail.com>
Date: Tue, 2 Aug 2016 14:03:13 -0700
Subject: [PATCH 023/111] cgroup: update cgroup's document path

cgroup's document path is changed to "cgroup-v1".  update it.

Link: http://lkml.kernel.org/r/1470148443-6509-1-git-send-email-iamyooon@gmail.com
Signed-off-by: seokhoon.yoon <iamyooon@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 init/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 46f817abff0e..79c6aad5ea15 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -952,7 +952,7 @@ menuconfig CGROUPS
 	  controls or device isolation.
 	  See
 		- Documentation/scheduler/sched-design-CFS.txt	(CFS)
-		- Documentation/cgroups/ (features for grouping, isolation
+		- Documentation/cgroup-v1/ (features for grouping, isolation
 					  and resource control)
 
 	  Say N if unsure.
@@ -1009,7 +1009,7 @@ config BLK_CGROUP
 	CONFIG_CFQ_GROUP_IOSCHED=y; for enabling throttling policy, set
 	CONFIG_BLK_DEV_THROTTLING=y.
 
-	See Documentation/cgroups/blkio-controller.txt for more information.
+	See Documentation/cgroup-v1/blkio-controller.txt for more information.
 
 config DEBUG_BLK_CGROUP
 	bool "IO controller debugging"

From db4ad0360c54a4e3bf07de9dfe429259ca8dc223 Mon Sep 17 00:00:00 2001
From: Luis de Bethencourt <luisbg@osg.samsung.com>
Date: Tue, 2 Aug 2016 14:03:16 -0700
Subject: [PATCH 024/111] MAINTAINERS: befs: add new maintainers

Salah Triki and Luis de Bethencourt are taking over maintainership of
befs.

Link: http://lkml.kernel.org/r/1469651079-32455-1-git-send-email-luisbg@osg.samsung.com
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 MAINTAINERS | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 25f43204014d..bb51bbbc9e1d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2346,7 +2346,10 @@ S:	Supported
 F:	drivers/media/platform/sti/bdisp
 
 BEFS FILE SYSTEM
-S:	Orphan
+M:	Luis de Bethencourt <luisbg@osg.samsung.com>
+M:	Salah Triki <salah.triki@gmail.com>
+S:	Maintained
+T:	git git://github.com/luisbg/linux-befs.git
 F:	Documentation/filesystems/befs.txt
 F:	fs/befs/
 

From ef419398b68925f21fd3d8463c7bf6934d2ec926 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 2 Aug 2016 14:03:19 -0700
Subject: [PATCH 025/111] proc_oom_score: remove tasklist_lock and pid_alive()

This was needed before to ensure that ->signal != 0 and do_each_thread()
is safe, see commit b95c35e76b29b ("oom: fix the unsafe usage of
badness() in proc_oom_score()") for details.

Today tsk->signal can't go away and for_each_thread(tsk) is always safe.

Link: http://lkml.kernel.org/r/20160608211921.GA15508@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/proc/base.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 31370da2ee7c..54e270262979 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -579,11 +579,8 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns,
 	unsigned long totalpages = totalram_pages + total_swap_pages;
 	unsigned long points = 0;
 
-	read_lock(&tasklist_lock);
-	if (pid_alive(task))
-		points = oom_badness(task, NULL, NULL, totalpages) *
-						1000 / totalpages;
-	read_unlock(&tasklist_lock);
+	points = oom_badness(task, NULL, NULL, totalpages) *
+					1000 / totalpages;
 	seq_printf(m, "%lu\n", points);
 
 	return 0;

From 519ded5a89ec0e46e5b0867ba9f5752239b73898 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd@arndb.de>
Date: Tue, 2 Aug 2016 14:03:22 -0700
Subject: [PATCH 026/111] procfs: avoid 32-bit time_t in /proc/*/stat

/proc/stat shows (among lots of other things) the current boottime (i.e.
number of seconds since boot).  While a 32-bit number is sufficient for
this particular case, we want to get rid of the 'struct timespec'
suffers from a 32-bit overflow in 2038.

This changes the code to use a struct timespec64, which is known to be
safe in all cases.

Link: http://lkml.kernel.org/r/20160617201247.2292101-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/proc/stat.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 510413eb25b8..7907e456ac4f 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -80,19 +80,17 @@ static u64 get_iowait_time(int cpu)
 static int show_stat(struct seq_file *p, void *v)
 {
 	int i, j;
-	unsigned long jif;
 	u64 user, nice, system, idle, iowait, irq, softirq, steal;
 	u64 guest, guest_nice;
 	u64 sum = 0;
 	u64 sum_softirq = 0;
 	unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
-	struct timespec boottime;
+	struct timespec64 boottime;
 
 	user = nice = system = idle = iowait =
 		irq = softirq = steal = 0;
 	guest = guest_nice = 0;
-	getboottime(&boottime);
-	jif = boottime.tv_sec;
+	getboottime64(&boottime);
 
 	for_each_possible_cpu(i) {
 		user += kcpustat_cpu(i).cpustat[CPUTIME_USER];
@@ -163,12 +161,12 @@ static int show_stat(struct seq_file *p, void *v)
 
 	seq_printf(p,
 		"\nctxt %llu\n"
-		"btime %lu\n"
+		"btime %llu\n"
 		"processes %lu\n"
 		"procs_running %lu\n"
 		"procs_blocked %lu\n",
 		nr_context_switches(),
-		(unsigned long)jif,
+		(unsigned long long)boottime.tv_sec,
 		total_forks,
 		nr_running(),
 		nr_iowait());

From ca52953f5f24aff0aa8fa8de750b76ba0302142d Mon Sep 17 00:00:00 2001
From: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Date: Tue, 2 Aug 2016 14:03:25 -0700
Subject: [PATCH 027/111] fs/proc/task_mmu.c: suppress compilation warnings
 with W=1

Suppress a bunch of warnings of the form:

  fs/proc/task_mmu.c: In function 'show_smap_vma_flags':
  fs/proc/task_mmu.c:635:22: warning: initialized field overwritten [-Wt override-init]
     [ilog2(VM_READ)] = "rd",
                        ^~~~
  fs/proc/task_mmu.c:635:22: note: (near initialization for 'mnemonics[0]')

They happen because of the way we intentionally build the table, so
silence the warning when building with 'make W=1'.

Link: http://lkml.kernel.org/r/8727.1470022083@turing-police.cc.vt.edu
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/proc/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 7151ea428041..a8c13605b434 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -4,6 +4,7 @@
 
 obj-y   += proc.o
 
+CFLAGS_task_mmu.o	+= -Wno-override-init
 proc-y			:= nommu.o task_nommu.o
 proc-$(CONFIG_MMU)	:= task_mmu.o
 

From bc083a64b6c035135c0f80718f9e9192cc0867c6 Mon Sep 17 00:00:00 2001
From: Richard Weinberger <richard@nod.at>
Date: Tue, 2 Aug 2016 14:03:27 -0700
Subject: [PATCH 028/111] init/Kconfig: make COMPILE_TEST depend on !UML

UML is a bit special since it does not have iomem nor dma.  That means a
lot of drivers will not build if they miss a dependency on HAS_IOMEM.
s390 used to have the same issues but since it gained PCI support UML is
the only stranger.

We are tired of patching dozens of new drivers after every merge window
just to un-break allmod/yesconfig UML builds.  One could argue that a
decent driver has to know on what it depends and therefore a missing
HAS_IOMEM dependency is a clear driver bug.  But the dependency not
obvious and not everyone does UML builds with COMPILE_TEST enabled when
developing a device driver.

A possible solution to make these builds succeed on UML would be
providing stub functions for ioremap() and friends which fail upon
runtime.  Another one is simply disabling COMPILE_TEST for UML.  Since
it is the least hassle and does not force use to fake iomem support
let's do the latter.

Link: http://lkml.kernel.org/r/1466152995-28367-1-git-send-email-richard@nod.at
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 init/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/init/Kconfig b/init/Kconfig
index 79c6aad5ea15..8f08f49a7c39 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -55,6 +55,7 @@ config CROSS_COMPILE
 
 config COMPILE_TEST
 	bool "Compile also drivers which will not load"
+	depends on !UML
 	default n
 	help
 	  Some drivers can be compiled on a different platform than they are

From ca945e71529c69f71b773b31f03a681876872117 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.com>
Date: Tue, 2 Aug 2016 14:03:30 -0700
Subject: [PATCH 029/111] memstick: don't allocate unused major for ms_block

When alloc_disk(0) is used the ->major number is completely ignored.
All devices are allocated with a major of BLOCK_EXT_MAJOR.

So remove registration and deregistration of 'major'.

Link: http://lkml.kernel.org/r/20160602064318.4403.49955.stgit@noble
Signed-off-by: NeilBrown <neilb@suse.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/memstick/core/ms_block.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/memstick/core/ms_block.c b/drivers/memstick/core/ms_block.c
index 40bb8ae5853c..aacf584f2a42 100644
--- a/drivers/memstick/core/ms_block.c
+++ b/drivers/memstick/core/ms_block.c
@@ -2338,23 +2338,11 @@ static struct memstick_driver msb_driver = {
 	.resume   = msb_resume
 };
 
-static int major;
-
 static int __init msb_init(void)
 {
-	int rc = register_blkdev(0, DRIVER_NAME);
-
-	if (rc < 0) {
-		pr_err("failed to register major (error %d)\n", rc);
-		return rc;
-	}
-
-	major = rc;
-	rc = memstick_register_driver(&msb_driver);
-	if (rc) {
-		unregister_blkdev(major, DRIVER_NAME);
+	int rc = memstick_register_driver(&msb_driver);
+	if (rc)
 		pr_err("failed to register memstick driver (error %d)\n", rc);
-	}
 
 	return rc;
 }
@@ -2362,7 +2350,6 @@ static int __init msb_init(void)
 static void __exit msb_exit(void)
 {
 	memstick_unregister_driver(&msb_driver);
-	unregister_blkdev(major, DRIVER_NAME);
 	idr_destroy(&msb_disk_idr);
 }
 

From bd721ea73e1f965569b40620538c942001f76294 Mon Sep 17 00:00:00 2001
From: Fabian Frederick <fabf@skynet.be>
Date: Tue, 2 Aug 2016 14:03:33 -0700
Subject: [PATCH 030/111] treewide: replace obsolete _refok by __ref

There was only one use of __initdata_refok and __exit_refok

__init_refok was used 46 times against 82 for __ref.

Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")

This patch removes the following compatibility definitions and replaces
them treewide.

/* compatibility defines */
#define __init_refok     __ref
#define __initdata_refok __refdata
#define __exit_refok     __ref

I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/alpha/kernel/machvec_impl.h         | 2 +-
 arch/arc/mm/init.c                       | 2 +-
 arch/arm/mach-integrator/impd1.c         | 4 ++--
 arch/arm/mach-mv78xx0/common.c           | 2 +-
 arch/blackfin/mm/init.c                  | 2 +-
 arch/hexagon/mm/init.c                   | 2 +-
 arch/ia64/kernel/mca.c                   | 2 +-
 arch/microblaze/mm/init.c                | 4 ++--
 arch/microblaze/mm/pgtable.c             | 2 +-
 arch/mips/mm/init.c                      | 2 +-
 arch/mips/txx9/generic/pci.c             | 2 +-
 arch/nios2/mm/init.c                     | 2 +-
 arch/openrisc/mm/ioremap.c               | 4 ++--
 arch/powerpc/lib/alloc.c                 | 2 +-
 arch/powerpc/mm/pgtable_32.c             | 2 +-
 arch/powerpc/platforms/powermac/setup.c  | 4 ++--
 arch/powerpc/platforms/ps3/device-init.c | 2 +-
 arch/powerpc/sysdev/msi_bitmap.c         | 2 +-
 arch/score/mm/init.c                     | 2 +-
 arch/sh/drivers/pci/pci.c                | 4 ++--
 arch/sh/mm/ioremap.c                     | 2 +-
 arch/x86/mm/init.c                       | 4 ++--
 arch/x86/platform/efi/early_printk.c     | 4 ++--
 drivers/acpi/osl.c                       | 5 ++---
 drivers/base/node.c                      | 2 +-
 drivers/clk/clkdev.c                     | 4 ++--
 drivers/pci/xen-pcifront.c               | 2 +-
 drivers/video/logo/logo.c                | 4 ++--
 include/acpi/acpi_io.h                   | 2 +-
 include/linux/init.h                     | 6 ------
 include/net/net_namespace.h              | 2 +-
 init/main.c                              | 2 +-
 mm/page_alloc.c                          | 4 ++--
 mm/slab.c                                | 2 +-
 mm/sparse-vmemmap.c                      | 2 +-
 mm/sparse.c                              | 2 +-
 36 files changed, 46 insertions(+), 53 deletions(-)

diff --git a/arch/alpha/kernel/machvec_impl.h b/arch/alpha/kernel/machvec_impl.h
index f54bdf658cd0..d3398f6ab74c 100644
--- a/arch/alpha/kernel/machvec_impl.h
+++ b/arch/alpha/kernel/machvec_impl.h
@@ -137,7 +137,7 @@
 #define __initmv __initdata
 #define ALIAS_MV(x)
 #else
-#define __initmv __initdata_refok
+#define __initmv __refdata
 
 /* GCC actually has a syntax for defining aliases, but is under some
    delusion that you shouldn't be able to declare it extern somewhere
diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
index 8be930394750..399e2f223d25 100644
--- a/arch/arc/mm/init.c
+++ b/arch/arc/mm/init.c
@@ -220,7 +220,7 @@ void __init mem_init(void)
 /*
  * free_initmem: Free all the __init memory.
  */
-void __init_refok free_initmem(void)
+void __ref free_initmem(void)
 {
 	free_initmem_default(-1);
 }
diff --git a/arch/arm/mach-integrator/impd1.c b/arch/arm/mach-integrator/impd1.c
index 38b0da300dd5..ed9a01484030 100644
--- a/arch/arm/mach-integrator/impd1.c
+++ b/arch/arm/mach-integrator/impd1.c
@@ -320,11 +320,11 @@ static struct impd1_device impd1_devs[] = {
 #define IMPD1_VALID_IRQS 0x00000bffU
 
 /*
- * As this module is bool, it is OK to have this as __init_refok() - no
+ * As this module is bool, it is OK to have this as __ref() - no
  * probe calls will be done after the initial system bootup, as devices
  * are discovered as part of the machine startup.
  */
-static int __init_refok impd1_probe(struct lm_device *dev)
+static int __ref impd1_probe(struct lm_device *dev)
 {
 	struct impd1_module *impd1;
 	int irq_base;
diff --git a/arch/arm/mach-mv78xx0/common.c b/arch/arm/mach-mv78xx0/common.c
index 45a05207b418..6af5430d0d97 100644
--- a/arch/arm/mach-mv78xx0/common.c
+++ b/arch/arm/mach-mv78xx0/common.c
@@ -343,7 +343,7 @@ void __init mv78xx0_init_early(void)
 				DDR_WINDOW_CPU1_BASE, DDR_WINDOW_CPU_SZ);
 }
 
-void __init_refok mv78xx0_timer_init(void)
+void __ref mv78xx0_timer_init(void)
 {
 	orion_time_init(BRIDGE_VIRT_BASE, BRIDGE_INT_TIMER1_CLR,
 			IRQ_MV78XX0_TIMER_1, get_tclk());
diff --git a/arch/blackfin/mm/init.c b/arch/blackfin/mm/init.c
index 166842de3dc7..b59cd7c3261a 100644
--- a/arch/blackfin/mm/init.c
+++ b/arch/blackfin/mm/init.c
@@ -112,7 +112,7 @@ void __init free_initrd_mem(unsigned long start, unsigned long end)
 }
 #endif
 
-void __init_refok free_initmem(void)
+void __ref free_initmem(void)
 {
 #if defined CONFIG_RAMKERNEL && !defined CONFIG_MPU
 	free_initmem_default(-1);
diff --git a/arch/hexagon/mm/init.c b/arch/hexagon/mm/init.c
index 88977e42af0a..192584d5ac2f 100644
--- a/arch/hexagon/mm/init.c
+++ b/arch/hexagon/mm/init.c
@@ -93,7 +93,7 @@ void __init mem_init(void)
  * Todo:  free pages between __init_begin and __init_end; possibly
  * some devtree related stuff as well.
  */
-void __init_refok free_initmem(void)
+void __ref free_initmem(void)
 {
 }
 
diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index 07a4e32ae96a..eb9220cde76c 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -1831,7 +1831,7 @@ format_mca_init_stack(void *mca_data, unsigned long offset,
 }
 
 /* Caller prevents this from being called after init */
-static void * __init_refok mca_bootmem(void)
+static void * __ref mca_bootmem(void)
 {
 	return __alloc_bootmem(sizeof(struct ia64_mca_cpu),
 	                    KERNEL_STACK_SIZE, 0);
diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
index 77bc7c7e6522..434639f9a3a6 100644
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -414,7 +414,7 @@ void __init *early_get_page(void)
 
 #endif /* CONFIG_MMU */
 
-void * __init_refok alloc_maybe_bootmem(size_t size, gfp_t mask)
+void * __ref alloc_maybe_bootmem(size_t size, gfp_t mask)
 {
 	if (mem_init_done)
 		return kmalloc(size, mask);
@@ -422,7 +422,7 @@ void * __init_refok alloc_maybe_bootmem(size_t size, gfp_t mask)
 		return alloc_bootmem(size);
 }
 
-void * __init_refok zalloc_maybe_bootmem(size_t size, gfp_t mask)
+void * __ref zalloc_maybe_bootmem(size_t size, gfp_t mask)
 {
 	void *p;
 
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index eb99fcc76088..cc732fe357ad 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -234,7 +234,7 @@ unsigned long iopa(unsigned long addr)
 	return pa;
 }
 
-__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+__ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 		unsigned long address)
 {
 	pte_t *pte;
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 9b58eb5fd0d5..a5509e7dcad2 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -504,7 +504,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 
 void (*free_init_pages_eva)(void *begin, void *end) = NULL;
 
-void __init_refok free_initmem(void)
+void __ref free_initmem(void)
 {
 	prom_free_prom_memory();
 	/*
diff --git a/arch/mips/txx9/generic/pci.c b/arch/mips/txx9/generic/pci.c
index a77698ff2b6f..1f6bc9a3036c 100644
--- a/arch/mips/txx9/generic/pci.c
+++ b/arch/mips/txx9/generic/pci.c
@@ -268,7 +268,7 @@ static int txx9_i8259_irq_setup(int irq)
 	return err;
 }
 
-static void __init_refok quirk_slc90e66_bridge(struct pci_dev *dev)
+static void __ref quirk_slc90e66_bridge(struct pci_dev *dev)
 {
 	int irq;	/* PCI/ISA Bridge interrupt */
 	u8 reg_64;
diff --git a/arch/nios2/mm/init.c b/arch/nios2/mm/init.c
index e75c75d249d6..c92fe4234009 100644
--- a/arch/nios2/mm/init.c
+++ b/arch/nios2/mm/init.c
@@ -89,7 +89,7 @@ void __init free_initrd_mem(unsigned long start, unsigned long end)
 }
 #endif
 
-void __init_refok free_initmem(void)
+void __ref free_initmem(void)
 {
 	free_initmem_default(-1);
 }
diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c
index 5b2a95116e8f..fa60b81aee3e 100644
--- a/arch/openrisc/mm/ioremap.c
+++ b/arch/openrisc/mm/ioremap.c
@@ -38,7 +38,7 @@ static unsigned int fixmaps_used __initdata;
  * have to convert them into an offset in a page-aligned mapping, but the
  * caller shouldn't need to know that small detail.
  */
-void __iomem *__init_refok
+void __iomem *__ref
 __ioremap(phys_addr_t addr, unsigned long size, pgprot_t prot)
 {
 	phys_addr_t p;
@@ -116,7 +116,7 @@ void iounmap(void *addr)
  * the memblock infrastructure.
  */
 
-pte_t __init_refok *pte_alloc_one_kernel(struct mm_struct *mm,
+pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm,
 					 unsigned long address)
 {
 	pte_t *pte;
diff --git a/arch/powerpc/lib/alloc.c b/arch/powerpc/lib/alloc.c
index 60b0b3fc8fc1..a58abe4afbd1 100644
--- a/arch/powerpc/lib/alloc.c
+++ b/arch/powerpc/lib/alloc.c
@@ -6,7 +6,7 @@
 #include <asm/setup.h>
 
 
-void * __init_refok zalloc_maybe_bootmem(size_t size, gfp_t mask)
+void * __ref zalloc_maybe_bootmem(size_t size, gfp_t mask)
 {
 	void *p;
 
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 7f922f557936..0ae0572bc239 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -79,7 +79,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 #endif
 }
 
-__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+__ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
 	pte_t *pte;
 
diff --git a/arch/powerpc/platforms/powermac/setup.c b/arch/powerpc/platforms/powermac/setup.c
index 3de4a7c85140..6b4e9d181126 100644
--- a/arch/powerpc/platforms/powermac/setup.c
+++ b/arch/powerpc/platforms/powermac/setup.c
@@ -353,12 +353,12 @@ static int pmac_late_init(void)
 machine_late_initcall(powermac, pmac_late_init);
 
 /*
- * This is __init_refok because we check for "initializing" before
+ * This is __ref because we check for "initializing" before
  * touching any of the __init sensitive things and "initializing"
  * will be false after __init time. This can't be __init because it
  * can be called whenever a disk is first accessed.
  */
-void __init_refok note_bootable_part(dev_t dev, int part, int goodness)
+void __ref note_bootable_part(dev_t dev, int part, int goodness)
 {
 	char *p;
 
diff --git a/arch/powerpc/platforms/ps3/device-init.c b/arch/powerpc/platforms/ps3/device-init.c
index 3f175e8aedb4..57caaf11a83f 100644
--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -189,7 +189,7 @@ fail_malloc:
 	return result;
 }
 
-static int __init_refok ps3_setup_uhc_device(
+static int __ref ps3_setup_uhc_device(
 	const struct ps3_repository_device *repo, enum ps3_match_id match_id,
 	enum ps3_interrupt_type interrupt_type, enum ps3_reg_type reg_type)
 {
diff --git a/arch/powerpc/sysdev/msi_bitmap.c b/arch/powerpc/sysdev/msi_bitmap.c
index ed5234ed8d3f..5ebd3f018295 100644
--- a/arch/powerpc/sysdev/msi_bitmap.c
+++ b/arch/powerpc/sysdev/msi_bitmap.c
@@ -112,7 +112,7 @@ int msi_bitmap_reserve_dt_hwirqs(struct msi_bitmap *bmp)
 	return 0;
 }
 
-int __init_refok msi_bitmap_alloc(struct msi_bitmap *bmp, unsigned int irq_count,
+int __ref msi_bitmap_alloc(struct msi_bitmap *bmp, unsigned int irq_count,
 		     struct device_node *of_node)
 {
 	int size;
diff --git a/arch/score/mm/init.c b/arch/score/mm/init.c
index 9fbce49ad3bd..444c26c0f750 100644
--- a/arch/score/mm/init.c
+++ b/arch/score/mm/init.c
@@ -91,7 +91,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 }
 #endif
 
-void __init_refok free_initmem(void)
+void __ref free_initmem(void)
 {
 	free_initmem_default(POISON_FREE_INITMEM);
 }
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index d5462b7bc514..84563e39a5b8 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -221,7 +221,7 @@ pcibios_bus_report_status_early(struct pci_channel *hose,
  * We can't use pci_find_device() here since we are
  * called from interrupt context.
  */
-static void __init_refok
+static void __ref
 pcibios_bus_report_status(struct pci_bus *bus, unsigned int status_mask,
 			  int warn)
 {
@@ -256,7 +256,7 @@ pcibios_bus_report_status(struct pci_bus *bus, unsigned int status_mask,
 			pcibios_bus_report_status(dev->subordinate, status_mask, warn);
 }
 
-void __init_refok pcibios_report_status(unsigned int status_mask, int warn)
+void __ref pcibios_report_status(unsigned int status_mask, int warn)
 {
 	struct pci_channel *hose;
 
diff --git a/arch/sh/mm/ioremap.c b/arch/sh/mm/ioremap.c
index 0c99ec2e7ed8..d09ddfe58fd8 100644
--- a/arch/sh/mm/ioremap.c
+++ b/arch/sh/mm/ioremap.c
@@ -34,7 +34,7 @@
  * have to convert them into an offset in a page-aligned mapping, but the
  * caller shouldn't need to know that small detail.
  */
-void __iomem * __init_refok
+void __iomem * __ref
 __ioremap_caller(phys_addr_t phys_addr, unsigned long size,
 		 pgprot_t pgprot, void *caller)
 {
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index fb4c1b42fc7e..620928903be3 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -208,7 +208,7 @@ static int __meminit save_mr(struct map_range *mr, int nr_range,
  * adjust the page_size_mask for small range to go with
  *	big page size instead small one if nearby are ram too.
  */
-static void __init_refok adjust_range_page_size_mask(struct map_range *mr,
+static void __ref adjust_range_page_size_mask(struct map_range *mr,
 							 int nr_range)
 {
 	int i;
@@ -396,7 +396,7 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn)
  * This runs before bootmem is initialized and gets pages directly from
  * the physical memory. To access them they are temporarily mapped.
  */
-unsigned long __init_refok init_memory_mapping(unsigned long start,
+unsigned long __ref init_memory_mapping(unsigned long start,
 					       unsigned long end)
 {
 	struct map_range mr[NR_RANGE_MR];
diff --git a/arch/x86/platform/efi/early_printk.c b/arch/x86/platform/efi/early_printk.c
index 524142117296..5fdacb322ceb 100644
--- a/arch/x86/platform/efi/early_printk.c
+++ b/arch/x86/platform/efi/early_printk.c
@@ -44,7 +44,7 @@ early_initcall(early_efi_map_fb);
  * In case earlyprintk=efi,keep we have the whole framebuffer mapped already
  * so just return the offset efi_fb + start.
  */
-static __init_refok void *early_efi_map(unsigned long start, unsigned long len)
+static __ref void *early_efi_map(unsigned long start, unsigned long len)
 {
 	unsigned long base;
 
@@ -56,7 +56,7 @@ static __init_refok void *early_efi_map(unsigned long start, unsigned long len)
 		return early_ioremap(base + start, len);
 }
 
-static __init_refok void early_efi_unmap(void *addr, unsigned long len)
+static __ref void early_efi_unmap(void *addr, unsigned long len)
 {
 	if (!efi_fb)
 		early_iounmap(addr, len);
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index b108f1358a32..4305ee9db4b2 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -309,7 +309,7 @@ static void acpi_unmap(acpi_physical_address pg_off, void __iomem *vaddr)
  * During early init (when acpi_gbl_permanent_mmap has not been set yet) this
  * routine simply calls __acpi_map_table() to get the job done.
  */
-void __iomem *__init_refok
+void __iomem *__ref
 acpi_os_map_iomem(acpi_physical_address phys, acpi_size size)
 {
 	struct acpi_ioremap *map;
@@ -362,8 +362,7 @@ out:
 }
 EXPORT_SYMBOL_GPL(acpi_os_map_iomem);
 
-void *__init_refok
-acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
+void *__ref acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
 {
 	return (void *)acpi_os_map_iomem(phys, size);
 }
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 29cd96661b30..5548f9686016 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -370,7 +370,7 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 #define page_initialized(page)  (page->lru.next)
 
-static int __init_refok get_nid_for_pfn(unsigned long pfn)
+static int __ref get_nid_for_pfn(unsigned long pfn)
 {
 	struct page *page;
 
diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
index 89cc700fbc37..97ae60fa1584 100644
--- a/drivers/clk/clkdev.c
+++ b/drivers/clk/clkdev.c
@@ -250,7 +250,7 @@ struct clk_lookup_alloc {
 	char	con_id[MAX_CON_ID];
 };
 
-static struct clk_lookup * __init_refok
+static struct clk_lookup * __ref
 vclkdev_alloc(struct clk_hw *hw, const char *con_id, const char *dev_fmt,
 	va_list ap)
 {
@@ -287,7 +287,7 @@ vclkdev_create(struct clk_hw *hw, const char *con_id, const char *dev_fmt,
 	return cl;
 }
 
-struct clk_lookup * __init_refok
+struct clk_lookup * __ref
 clkdev_alloc(struct clk *clk, const char *con_id, const char *dev_fmt, ...)
 {
 	struct clk_lookup *cl;
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index 5f70fee59a94..d6ff5e82377d 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -1086,7 +1086,7 @@ out:
 	return err;
 }
 
-static void __init_refok pcifront_backend_changed(struct xenbus_device *xdev,
+static void __ref pcifront_backend_changed(struct xenbus_device *xdev,
 						  enum xenbus_state be_state)
 {
 	struct pcifront_device *pdev = dev_get_drvdata(&xdev->dev);
diff --git a/drivers/video/logo/logo.c b/drivers/video/logo/logo.c
index 10fbfd8ab963..b6bc4a0bda2a 100644
--- a/drivers/video/logo/logo.c
+++ b/drivers/video/logo/logo.c
@@ -36,11 +36,11 @@ static int __init fb_logo_late_init(void)
 
 late_initcall(fb_logo_late_init);
 
-/* logo's are marked __initdata. Use __init_refok to tell
+/* logo's are marked __initdata. Use __ref to tell
  * modpost that it is intended that this function uses data
  * marked __initdata.
  */
-const struct linux_logo * __init_refok fb_find_logo(int depth)
+const struct linux_logo * __ref fb_find_logo(int depth)
 {
 	const struct linux_logo *logo = NULL;
 
diff --git a/include/acpi/acpi_io.h b/include/acpi/acpi_io.h
index dd86c5fc102d..d7d0f495a34e 100644
--- a/include/acpi/acpi_io.h
+++ b/include/acpi/acpi_io.h
@@ -13,7 +13,7 @@ static inline void __iomem *acpi_os_ioremap(acpi_physical_address phys,
 }
 #endif
 
-void __iomem *__init_refok
+void __iomem *__ref
 acpi_os_map_iomem(acpi_physical_address phys, acpi_size size);
 void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size);
 void __iomem *acpi_os_get_iomem(acpi_physical_address phys, unsigned int size);
diff --git a/include/linux/init.h b/include/linux/init.h
index aedb254abc37..6935d02474aa 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -77,12 +77,6 @@
 #define __refdata        __section(.ref.data)
 #define __refconst       __constsection(.ref.rodata)
 
-/* compatibility defines */
-#define __init_refok     __ref
-#define __initdata_refok __refdata
-#define __exit_refok     __ref
-
-
 #ifdef MODULE
 #define __exitused
 #else
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc6e9c0..0933c7455a30 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -275,7 +275,7 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
 #define __net_initconst
 #else
 #define __net_init	__init
-#define __net_exit	__exit_refok
+#define __net_exit	__ref
 #define __net_initdata	__initdata
 #define __net_initconst	__initconst
 #endif
diff --git a/init/main.c b/init/main.c
index eae02aa03c9e..e7345dcaaf05 100644
--- a/init/main.c
+++ b/init/main.c
@@ -380,7 +380,7 @@ static void __init setup_command_line(char *command_line)
 
 static __initdata DECLARE_COMPLETION(kthreadd_done);
 
-static noinline void __init_refok rest_init(void)
+static noinline void __ref rest_init(void)
 {
 	int pid;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ea759b935360..39a372a2a1d6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5276,7 +5276,7 @@ void __init setup_per_cpu_pageset(void)
 		setup_zone_pageset(zone);
 }
 
-static noinline __init_refok
+static noinline __ref
 int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages)
 {
 	int i;
@@ -5903,7 +5903,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat)
 	}
 }
 
-static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
+static void __ref alloc_node_mem_map(struct pglist_data *pgdat)
 {
 	unsigned long __maybe_unused start = 0;
 	unsigned long __maybe_unused offset = 0;
diff --git a/mm/slab.c b/mm/slab.c
index ca135bd47c35..261147ba156f 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1877,7 +1877,7 @@ static struct array_cache __percpu *alloc_kmem_cache_cpus(
 	return cpu_cache;
 }
 
-static int __init_refok setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp)
+static int __ref setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp)
 {
 	if (slab_state >= FULL)
 		return enable_cpucache(cachep, gfp);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 68885dcbaf40..574c67b663fe 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -36,7 +36,7 @@
  * Uses the main allocators if they are available, else bootmem.
  */
 
-static void * __init_refok __earlyonly_bootmem_alloc(int node,
+static void * __ref __earlyonly_bootmem_alloc(int node,
 				unsigned long size,
 				unsigned long align,
 				unsigned long goal)
diff --git a/mm/sparse.c b/mm/sparse.c
index 36d7bbb80e49..1e168bf2779a 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -59,7 +59,7 @@ static inline void set_section_nid(unsigned long section_nr, int nid)
 #endif
 
 #ifdef CONFIG_SPARSEMEM_EXTREME
-static struct mem_section noinline __init_refok *sparse_index_alloc(int nid)
+static noinline struct mem_section __ref *sparse_index_alloc(int nid)
 {
 	struct mem_section *section = NULL;
 	unsigned long array_size = SECTIONS_PER_ROOT *

From db3f60012482756f46cc4d7d9ad7d793ae30360c Mon Sep 17 00:00:00 2001
From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Tue, 2 Aug 2016 14:03:36 -0700
Subject: [PATCH 031/111] uapi: move forward declarations of internal
 structures

Don't user forward declarations of internal kernel structures in headers
exported to userspace.

Move "struct completion;".
Move "struct task_struct;".

Link: http://lkml.kernel.org/r/20160713215808.GA22486@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/capability.h      | 1 +
 include/linux/sysctl.h          | 1 +
 include/uapi/linux/capability.h | 2 --
 include/uapi/linux/sysctl.h     | 2 --
 4 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 5f3c63dde2d5..dbc21c719ce6 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -38,6 +38,7 @@ struct cpu_vfs_cap_data {
 struct file;
 struct inode;
 struct dentry;
+struct task_struct;
 struct user_namespace;
 
 extern const kernel_cap_t __cap_empty_set;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index fa7bc29925c9..697e160c78d0 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -28,6 +28,7 @@
 #include <uapi/linux/sysctl.h>
 
 /* For the /proc/sys support */
+struct completion;
 struct ctl_table;
 struct nsproxy;
 struct ctl_table_root;
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 12c37a197d24..49bc06295398 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -15,8 +15,6 @@
 
 #include <linux/types.h>
 
-struct task_struct;
-
 /* User-level do most of the mapping between kernel and user
    capabilities based on the version tag given by the kernel. The
    kernel might be somewhat backwards compatible, but don't bet on
diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index 0956373b56db..d2b12152e358 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -26,8 +26,6 @@
 #include <linux/types.h>
 #include <linux/compiler.h>
 
-struct completion;
-
 #define CTL_MAXNAME 10		/* how many path components do we allow in a
 				   call to sysctl?   In other words, what is
 				   the largest acceptable value for the nlen

From bd804ba12525c13a096f64f305a169c654a706e7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Linus=20L=C3=BCssing?= <linus.luessing@c0d3.blue>
Date: Tue, 2 Aug 2016 14:03:39 -0700
Subject: [PATCH 032/111] =?UTF-8?q?mailmap:=20add=20Linus=20L=C3=BCssing?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

For one thing, summarizes all non-umlaut versions into the umlaut one
(Linus Luessing -> Linus Lüssing).

For another, maps obsolete email addresses to the current @c0d3.blue
one.

Link: http://lkml.kernel.org/r/1467805371-2773-1-git-send-email-linus.luessing@c0d3.blue
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 .mailmap | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.mailmap b/.mailmap
index d2acafb09e60..c0d57049d3f0 100644
--- a/.mailmap
+++ b/.mailmap
@@ -92,6 +92,8 @@ Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski.k@gmail.com>
 Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
 Leonid I Ananiev <leonid.i.ananiev@intel.com>
 Linas Vepstas <linas@austin.ibm.com>
+Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@web.de>
+Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@ascom.ch>
 Mark Brown <broonie@sirena.org.uk>
 Matthieu CASTET <castet.matthieu@free.fr>
 Mauro Carvalho Chehab <mchehab@kernel.org> <maurochehab@gmail.com> <mchehab@infradead.org> <mchehab@redhat.com> <m.chehab@samsung.com> <mchehab@osg.samsung.com> <mchehab@s-opensource.com>

From 949bed2f5764435715e3d6dd3ab6dd4dbd890a71 Mon Sep 17 00:00:00 2001
From: Chen Gang <chengang@emindsoft.com.cn>
Date: Tue, 2 Aug 2016 14:03:42 -0700
Subject: [PATCH 033/111] include: mman: use bool instead of int for the return
 value of arch_validate_prot

For pure bool function's return value, bool is a little better more or
less than int.

Link: http://lkml.kernel.org/r/1469331815-2026-1-git-send-email-chengang@emindsoft.com.cn
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/powerpc/include/asm/mman.h | 8 ++++----
 include/linux/mman.h            | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 2563c435a4b1..fc420cedecae 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -31,13 +31,13 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
 }
 #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
 
-static inline int arch_validate_prot(unsigned long prot)
+static inline bool arch_validate_prot(unsigned long prot)
 {
 	if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
-		return 0;
+		return false;
 	if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO))
-		return 0;
-	return 1;
+		return false;
+	return true;
 }
 #define arch_validate_prot(prot) arch_validate_prot(prot)
 
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 33e17f6a327a..634c4c51fe3a 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -49,7 +49,7 @@ static inline void vm_unacct_memory(long pages)
  *
  * Returns true if the prot flags are valid
  */
-static inline int arch_validate_prot(unsigned long prot)
+static inline bool arch_validate_prot(unsigned long prot)
 {
 	return (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM)) == 0;
 }

From 61e96496d3c949701a48b908f99f4ed891cd1101 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 2 Aug 2016 14:03:44 -0700
Subject: [PATCH 034/111] task_work: use READ_ONCE/lockless_dereference, avoid
 pi_lock if !task_works

Change task_work_cancel() to use lockless_dereference(), this is what
the code really wants but we didn't have this helper when it was
written.

Also add the fast-path task->task_works == NULL check, in the likely
case this task has no pending works and we can avoid
spin_lock(task->pi_lock).

While at it, change other users of ACCESS_ONCE() to use READ_ONCE().

Link: http://lkml.kernel.org/r/20160610150042.GA13868@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/task_work.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 6ab4842b00e8..d513051fcca2 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -29,7 +29,7 @@ task_work_add(struct task_struct *task, struct callback_head *work, bool notify)
 	struct callback_head *head;
 
 	do {
-		head = ACCESS_ONCE(task->task_works);
+		head = READ_ONCE(task->task_works);
 		if (unlikely(head == &work_exited))
 			return -ESRCH;
 		work->next = head;
@@ -57,6 +57,9 @@ task_work_cancel(struct task_struct *task, task_work_func_t func)
 	struct callback_head **pprev = &task->task_works;
 	struct callback_head *work;
 	unsigned long flags;
+
+	if (likely(!task->task_works))
+		return NULL;
 	/*
 	 * If cmpxchg() fails we continue without updating pprev.
 	 * Either we raced with task_work_add() which added the
@@ -64,8 +67,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t func)
 	 * we raced with task_work_run(), *pprev == NULL/exited.
 	 */
 	raw_spin_lock_irqsave(&task->pi_lock, flags);
-	while ((work = ACCESS_ONCE(*pprev))) {
-		smp_read_barrier_depends();
+	while ((work = lockless_dereference(*pprev))) {
 		if (work->func != func)
 			pprev = &work->next;
 		else if (cmpxchg(pprev, work, work->next) == work)
@@ -95,7 +97,7 @@ void task_work_run(void)
 		 * work_exited unless the list is empty.
 		 */
 		do {
-			work = ACCESS_ONCE(task->task_works);
+			work = READ_ONCE(task->task_works);
 			head = !work && (task->flags & PF_EXITING) ?
 				&work_exited : NULL;
 		} while (cmpxchg(&task->task_works, work, head) != work);

From 9d5059c959ac739dbf837cec14586e58e7a67292 Mon Sep 17 00:00:00 2001
From: Luis de Bethencourt <luisbg@osg.samsung.com>
Date: Tue, 2 Aug 2016 14:03:47 -0700
Subject: [PATCH 035/111] dynamic_debug: only add header when used

kernel.h header doesn't directly use dynamic debug, instead we can
include it in module.c (which used it via kernel.h).  printk.h only uses
it if CONFIG_DYNAMIC_DEBUG is on, changing the inclusion to only happen
in that case.

Link: http://lkml.kernel.org/r/1468429793-16917-1-git-send-email-luisbg@osg.samsung.com
[luisbg@osg.samsung.com: include dynamic_debug.h in drb_int.h]
  Link: http://lkml.kernel.org/r/1468447828-18558-2-git-send-email-luisbg@osg.samsung.com
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/block/drbd/drbd_actlog.c | 1 -
 drivers/block/drbd/drbd_int.h    | 1 +
 include/linux/kernel.h           | 1 -
 include/linux/printk.h           | 3 ++-
 kernel/module.c                  | 1 +
 5 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index 0a1aaf8c24c4..2d3d50ab74bf 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -27,7 +27,6 @@
 #include <linux/crc32c.h>
 #include <linux/drbd.h>
 #include <linux/drbd_limits.h>
-#include <linux/dynamic_debug.h>
 #include "drbd_int.h"
 
 
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 7b54354976a5..4cb8f21ff4ef 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -41,6 +41,7 @@
 #include <linux/backing-dev.h>
 #include <linux/genhd.h>
 #include <linux/idr.h>
+#include <linux/dynamic_debug.h>
 #include <net/tcp.h>
 #include <linux/lru_cache.h>
 #include <linux/prefetch.h>
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index c42082112ec8..d96a6118d26a 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -11,7 +11,6 @@
 #include <linux/log2.h>
 #include <linux/typecheck.h>
 #include <linux/printk.h>
-#include <linux/dynamic_debug.h>
 #include <asm/byteorder.h>
 #include <uapi/linux/kernel.h>
 
diff --git a/include/linux/printk.h b/include/linux/printk.h
index f136b22c7772..987c65ed34e5 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -289,10 +289,11 @@ extern asmlinkage void dump_stack(void) __cold;
 	no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
 #endif
 
-#include <linux/dynamic_debug.h>
 
 /* If you are writing a driver, please use dev_dbg instead */
 #if defined(CONFIG_DYNAMIC_DEBUG)
+#include <linux/dynamic_debug.h>
+
 /* dynamic_pr_debug() uses pr_fmt() internally so we don't need it here */
 #define pr_debug(fmt, ...) \
 	dynamic_pr_debug(fmt, ##__VA_ARGS__)
diff --git a/kernel/module.c b/kernel/module.c
index 5f71aa63ed2a..a0f48b8b00da 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -60,6 +60,7 @@
 #include <linux/jump_label.h>
 #include <linux/pfn.h>
 #include <linux/bsearch.h>
+#include <linux/dynamic_debug.h>
 #include <uapi/linux/module.h>
 #include "module-internal.h"
 

From bebca05281d039e4144e1c51f402fd1d5f54b5e2 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Tue, 2 Aug 2016 14:03:50 -0700
Subject: [PATCH 036/111] printk: do not include interrupt.h

A trivial cosmetic change: interrupt.h header is redundant since commit
6b898c07cb1d ("console: use might_sleep in console_lock").

Link: http://lkml.kernel.org/r/20160620132847.21930-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/printk/printk.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d4de33934dac..09af62e71fee 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -26,7 +26,6 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
-#include <linux/interrupt.h>			/* For in_interrupt() */
 #include <linux/delay.h>
 #include <linux/smp.h>
 #include <linux/security.h>

From 874f9c7da9a4acbc1b9e12ca722579fb50e4d142 Mon Sep 17 00:00:00 2001
From: Joe Perches <joe@perches.com>
Date: Tue, 2 Aug 2016 14:03:53 -0700
Subject: [PATCH 037/111] printk: create pr_<level> functions

Using functions instead of macros can reduce overall code size by
eliminating unnecessary "KERN_SOH<digit>" prefixes from format strings.

  defconfig x86-64:

  $ size vmlinux*
     text    data     bss      dec     hex  filename
  10193570 4331464 1105920 15630954  ee826a vmlinux.new
  10192623 4335560 1105920 15634103  ee8eb7 vmlinux.old

As the return value are unimportant and unused in the kernel tree, these
new functions return void.

Miscellanea:

 - change pr_<level> macros to call new __pr_<level> functions
 - change vprintk_nmi and vprintk_default to add LOGLEVEL_<level> argument

[akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe]
Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/printk.h   | 48 +++++++++++++++++++++++++++-------------
 kernel/printk/internal.h | 16 +++++++++-----
 kernel/printk/nmi.c      | 13 +++++++++--
 kernel/printk/printk.c   | 27 +++++++++++++++++++---
 4 files changed, 78 insertions(+), 26 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 987c65ed34e5..c2158f0f1499 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -257,21 +257,39 @@ extern asmlinkage void dump_stack(void) __cold;
  * and other debug macros are compiled out unless either DEBUG is defined
  * or CONFIG_DYNAMIC_DEBUG is set.
  */
-#define pr_emerg(fmt, ...) \
-	printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_alert(fmt, ...) \
-	printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_crit(fmt, ...) \
-	printk(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_err(fmt, ...) \
-	printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_warning(fmt, ...) \
-	printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_warn pr_warning
-#define pr_notice(fmt, ...) \
-	printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_info(fmt, ...) \
-	printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+
+#ifdef CONFIG_PRINTK
+
+asmlinkage __printf(1, 2) __cold void __pr_emerg(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_alert(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_crit(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_err(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_warn(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_notice(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_info(const char *fmt, ...);
+
+#define pr_emerg(fmt, ...)	__pr_emerg(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_alert(fmt, ...)	__pr_alert(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...)	__pr_crit(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...)	__pr_err(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn(fmt, ...)	__pr_warn(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_notice(fmt, ...)	__pr_notice(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info(fmt, ...)	__pr_info(pr_fmt(fmt), ##__VA_ARGS__)
+
+#else
+
+#define pr_emerg(fmt, ...)	printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_alert(fmt, ...)	printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...)	printk(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...)	printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn(fmt, ...)	printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_notice(fmt, ...)	printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info(fmt, ...)	printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+
+#endif
+
+#define pr_warning pr_warn
+
 /*
  * Like KERN_CONT, pr_cont() should only be used when continuing
  * a line with no newline ('\n') enclosed. Otherwise it defaults
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 7fd2838fa417..5d4505f30083 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -16,9 +16,11 @@
  */
 #include <linux/percpu.h>
 
-typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+typedef __printf(2, 0) int (*printk_func_t)(int level, const char *fmt,
+					    va_list args);
 
-int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
+__printf(2, 0)
+int vprintk_default(int level, const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
@@ -31,9 +33,10 @@ extern raw_spinlock_t logbuf_lock;
  * via per-CPU variable.
  */
 DECLARE_PER_CPU(printk_func_t, printk_func);
-static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+__printf(2, 0)
+static inline int vprintk_func(int level, const char *fmt, va_list args)
 {
-	return this_cpu_read(printk_func)(fmt, args);
+	return this_cpu_read(printk_func)(level, fmt, args);
 }
 
 extern atomic_t nmi_message_lost;
@@ -44,9 +47,10 @@ static inline int get_nmi_message_lost(void)
 
 #else /* CONFIG_PRINTK_NMI */
 
-static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+__printf(2, 0)
+static inline int vprintk_func(int level, const char *fmt, va_list args)
 {
-	return vprintk_default(fmt, args);
+	return vprintk_default(level, fmt, args);
 }
 
 static inline int get_nmi_message_lost(void)
diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c
index b69eb8a2876f..bc3eeb1ae6da 100644
--- a/kernel/printk/nmi.c
+++ b/kernel/printk/nmi.c
@@ -58,7 +58,7 @@ static DEFINE_PER_CPU(struct nmi_seq_buf, nmi_print_seq);
  * one writer running. But the buffer might get flushed from another
  * CPU, so we need to be careful.
  */
-static int vprintk_nmi(const char *fmt, va_list args)
+static int vprintk_nmi(int level, const char *fmt, va_list args)
 {
 	struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
 	int add = 0;
@@ -79,7 +79,16 @@ again:
 	if (!len)
 		smp_rmb();
 
-	add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, args);
+	if (level != LOGLEVEL_DEFAULT) {
+		add = snprintf(s->buffer + len, sizeof(s->buffer) - len,
+				KERN_SOH "%c", '0' + level);
+		add += vsnprintf(s->buffer + len + add,
+				 sizeof(s->buffer) - len - add,
+				 fmt, args);
+	} else {
+		add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len,
+				fmt, args);
+	}
 
 	/*
 	 * Do it once again if the buffer has been flushed in the meantime.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 09af62e71fee..d2accf2f4448 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1801,7 +1801,28 @@ asmlinkage int printk_emit(int facility, int level,
 }
 EXPORT_SYMBOL(printk_emit);
 
-int vprintk_default(const char *fmt, va_list args)
+#ifdef CONFIG_PRINTK
+#define define_pr_level(func, loglevel)				\
+asmlinkage __visible void func(const char *fmt, ...)		\
+{								\
+	va_list args;						\
+								\
+	va_start(args, fmt);					\
+	vprintk_default(loglevel, fmt, args);			\
+	va_end(args);						\
+}								\
+EXPORT_SYMBOL(func)
+
+define_pr_level(__pr_emerg, LOGLEVEL_EMERG);
+define_pr_level(__pr_alert, LOGLEVEL_ALERT);
+define_pr_level(__pr_crit, LOGLEVEL_CRIT);
+define_pr_level(__pr_err, LOGLEVEL_ERR);
+define_pr_level(__pr_warn, LOGLEVEL_WARNING);
+define_pr_level(__pr_notice, LOGLEVEL_NOTICE);
+define_pr_level(__pr_info, LOGLEVEL_INFO);
+#endif
+
+int vprintk_default(int level, const char *fmt, va_list args)
 {
 	int r;
 
@@ -1811,7 +1832,7 @@ int vprintk_default(const char *fmt, va_list args)
 		return r;
 	}
 #endif
-	r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
+	r = vprintk_emit(0, level, NULL, 0, fmt, args);
 
 	return r;
 }
@@ -1844,7 +1865,7 @@ asmlinkage __visible int printk(const char *fmt, ...)
 	int r;
 
 	va_start(args, fmt);
-	r = vprintk_func(fmt, args);
+	r = vprintk_func(LOGLEVEL_DEFAULT, fmt, args);
 	va_end(args);
 
 	return r;

From cf7754441c563230ed74096fcd4b8cca49910550 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Tue, 2 Aug 2016 14:03:56 -0700
Subject: [PATCH 038/111] printk: introduce suppress_message_printing()

Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.

Schematically, console_unlock() does the following:

console_unlock()
{
        ...
        for (;;) {
                ...
                raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
                msg = log_from_idx(console_idx);

                if (msg->flags & LOG_NOCONS) {
                        ...
                        goto skip;
                }

                level = msg->level;
                len += msg_print_text();                        >> sprintfs
                                                                   memcpy,
                                                                   etc.

                if (nr_ext_console_drivers) {
                        ext_len = msg_print_ext_header();       >> scnprintf
                        ext_len += msg_print_ext_body();        >> scnprintfs
                                                                   etc.
                }
                ...
                raw_spin_unlock(&logbuf_lock);

                call_console_drivers(level, ext_text, ext_len, text, len)
                {
                        if (level >= console_loglevel &&        >> drop the message
                                        !ignore_loglevel)
                                return;

                        console->write(...);
                }

                local_irq_restore(flags);
        }
        ...
}

The thing here is this deferred `level >= console_loglevel' check.  We
are wasting CPU cycles on sprintfs/memcpy/etc.  preparing the messages
that we will eventually drop.

This can be huge when we register a new CON_PRINTBUFFER console, for
instance.  For every such a console register_console() resets the

        console_seq, console_idx, console_prev

and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console.  And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.

We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.

The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed.  This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.

Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/printk/printk.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d2accf2f4448..8bdce14254f4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -985,6 +985,11 @@ module_param(ignore_loglevel, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(ignore_loglevel,
 		 "ignore loglevel setting (prints all kernel messages to the console)");
 
+static bool suppress_message_printing(int level)
+{
+	return (level >= console_loglevel && !ignore_loglevel);
+}
+
 #ifdef CONFIG_BOOT_PRINTK_DELAY
 
 static int boot_delay; /* msecs delay after each printk during bootup */
@@ -1014,7 +1019,7 @@ static void boot_delay_msec(int level)
 	unsigned long timeout;
 
 	if ((boot_delay == 0 || system_state != SYSTEM_BOOTING)
-		|| (level >= console_loglevel && !ignore_loglevel)) {
+		|| suppress_message_printing(level)) {
 		return;
 	}
 
@@ -1438,8 +1443,6 @@ static void call_console_drivers(int level,
 
 	trace_console(text, len);
 
-	if (level >= console_loglevel && !ignore_loglevel)
-		return;
 	if (!console_drivers)
 		return;
 
@@ -1908,6 +1911,7 @@ static void call_console_drivers(int level,
 static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
 			     bool syslog, char *buf, size_t size) { return 0; }
 static size_t cont_print_text(char *text, size_t size) { return 0; }
+static bool suppress_message_printing(int level) { return false; }
 
 /* Still needs to be defined for users */
 DEFINE_PER_CPU(printk_func_t, printk_func);
@@ -2187,6 +2191,13 @@ static void console_cont_flush(char *text, size_t size)
 	if (!cont.len)
 		goto out;
 
+	if (suppress_message_printing(cont.level)) {
+		cont.cons = cont.len;
+		if (cont.flushed)
+			cont.len = 0;
+		goto out;
+	}
+
 	/*
 	 * We still queue earlier records, likely because the console was
 	 * busy. The earlier ones need to be printed before this one, we
@@ -2290,10 +2301,13 @@ skip:
 			break;
 
 		msg = log_from_idx(console_idx);
-		if (msg->flags & LOG_NOCONS) {
+		level = msg->level;
+		if ((msg->flags & LOG_NOCONS) ||
+				suppress_message_printing(level)) {
 			/*
 			 * Skip record we have buffered and already printed
-			 * directly to the console when we received it.
+			 * directly to the console when we received it, and
+			 * record that has level above the console loglevel.
 			 */
 			console_idx = log_next(console_idx);
 			console_seq++;
@@ -2307,7 +2321,6 @@ skip:
 			goto skip;
 		}
 
-		level = msg->level;
 		len += msg_print_text(msg, console_prev, false,
 				      text + len, sizeof(text) - len);
 		if (nr_ext_console_drivers) {

From 40a7d9f5f90681c6d7890b6a07f230bb4afe7e39 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 2 Aug 2016 14:03:59 -0700
Subject: [PATCH 039/111] printk: include <asm/sections.h> instead of
 <asm-generic/sections.h>

asm-generic headers are generic implementations for architecture
specific code and should not be included by common code.  Thus use the
asm/ version of sections.h to get at the linker sections.

Link: http://lkml.kernel.org/r/1468285008-7331-1-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/printk/printk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8bdce14254f4..70c66c5ba212 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -47,7 +47,7 @@
 #include <linux/uio.h>
 
 #include <asm/uaccess.h>
-#include <asm-generic/sections.h>
+#include <asm/sections.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/printk.h>

From b5644a153d2701ffc335cfb9ef49967bd5b6a3c2 Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp@suse.de>
Date: Tue, 2 Aug 2016 14:04:01 -0700
Subject: [PATCH 040/111] fbdev/bfin_adv7393fb: move DRIVER_NAME before its
 first use

Move the DRIVER_NAME macro definition before the first usage site and
fix build error.

Link: http://lkml.kernel.org/r/20160801163937.GA28119@nazgul.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/video/fbdev/bfin_adv7393fb.c | 2 ++
 drivers/video/fbdev/bfin_adv7393fb.h | 2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/video/fbdev/bfin_adv7393fb.c b/drivers/video/fbdev/bfin_adv7393fb.c
index 8fe41caac38e..e2d7d039ce3b 100644
--- a/drivers/video/fbdev/bfin_adv7393fb.c
+++ b/drivers/video/fbdev/bfin_adv7393fb.c
@@ -10,6 +10,8 @@
  * TODO: Code Cleanup
  */
 
+#define DRIVER_NAME "bfin-adv7393"
+
 #define pr_fmt(fmt) DRIVER_NAME ": " fmt
 
 #include <linux/module.h>
diff --git a/drivers/video/fbdev/bfin_adv7393fb.h b/drivers/video/fbdev/bfin_adv7393fb.h
index cd591b5152a5..afd0380e19e1 100644
--- a/drivers/video/fbdev/bfin_adv7393fb.h
+++ b/drivers/video/fbdev/bfin_adv7393fb.h
@@ -59,8 +59,6 @@ enum {
 	BLANK_OFF,
 };
 
-#define DRIVER_NAME "bfin-adv7393"
-
 struct adv7393fb_modes {
 	const s8 name[25];	/* Full name */
 	u16 xres;		/* Active Horizonzal Pixels  */

From 6b1d174b0c27b5de421eda55c2731f32b6bd9852 Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp@suse.de>
Date: Tue, 2 Aug 2016 14:04:04 -0700
Subject: [PATCH 041/111] ratelimit: extend to print suppressed messages on
 release
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Extend the ratelimiting facility to print the amount of suppressed lines
when it is being released.

This use case is aimed at short-termed, burst-like users for which we
want to output the suppressed lines stats only once, after it has been
disposed of.  For an example, see /dev/kmsg usage in a follow-on patch.

Also, change the printk() line we issue on release to not use
"callbacks" as it is misleading: we're not suppressing callbacks but
printk() calls.

This has been separated from a previous patch by Linus.

Link: http://lkml.kernel.org/r/20160716061745.15795-2-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Franck Bui <fbui@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/ratelimit.h | 38 +++++++++++++++++++++++++++++++++-----
 lib/ratelimit.c           | 10 ++++++----
 2 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/include/linux/ratelimit.h b/include/linux/ratelimit.h
index 18102529254e..57c9e0622a38 100644
--- a/include/linux/ratelimit.h
+++ b/include/linux/ratelimit.h
@@ -2,11 +2,15 @@
 #define _LINUX_RATELIMIT_H
 
 #include <linux/param.h>
+#include <linux/sched.h>
 #include <linux/spinlock.h>
 
 #define DEFAULT_RATELIMIT_INTERVAL	(5 * HZ)
 #define DEFAULT_RATELIMIT_BURST		10
 
+/* issue num suppressed message on exit */
+#define RATELIMIT_MSG_ON_RELEASE	BIT(0)
+
 struct ratelimit_state {
 	raw_spinlock_t	lock;		/* protect the state */
 
@@ -15,6 +19,7 @@ struct ratelimit_state {
 	int		printed;
 	int		missed;
 	unsigned long	begin;
+	unsigned long	flags;
 };
 
 #define RATELIMIT_STATE_INIT(name, interval_init, burst_init) {		\
@@ -34,12 +39,35 @@ struct ratelimit_state {
 static inline void ratelimit_state_init(struct ratelimit_state *rs,
 					int interval, int burst)
 {
+	memset(rs, 0, sizeof(*rs));
+
 	raw_spin_lock_init(&rs->lock);
-	rs->interval = interval;
-	rs->burst = burst;
-	rs->printed = 0;
-	rs->missed = 0;
-	rs->begin = 0;
+	rs->interval	= interval;
+	rs->burst	= burst;
+}
+
+static inline void ratelimit_default_init(struct ratelimit_state *rs)
+{
+	return ratelimit_state_init(rs, DEFAULT_RATELIMIT_INTERVAL,
+					DEFAULT_RATELIMIT_BURST);
+}
+
+static inline void ratelimit_state_exit(struct ratelimit_state *rs)
+{
+	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
+		return;
+
+	if (rs->missed) {
+		pr_warn("%s: %d output lines suppressed due to ratelimiting\n",
+			current->comm, rs->missed);
+		rs->missed = 0;
+	}
+}
+
+static inline void
+ratelimit_set_flags(struct ratelimit_state *rs, unsigned long flags)
+{
+	rs->flags = flags;
 }
 
 extern struct ratelimit_state printk_ratelimit_state;
diff --git a/lib/ratelimit.c b/lib/ratelimit.c
index 2c5de86460c5..08f8043cac61 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -46,12 +46,14 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
 		rs->begin = jiffies;
 
 	if (time_is_before_jiffies(rs->begin + rs->interval)) {
-		if (rs->missed)
-			printk(KERN_WARNING "%s: %d callbacks suppressed\n",
-				func, rs->missed);
+		if (rs->missed) {
+			if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
+				pr_warn("%s: %d callbacks suppressed\n", func, rs->missed);
+				rs->missed = 0;
+			}
+		}
 		rs->begin   = jiffies;
 		rs->printed = 0;
-		rs->missed  = 0;
 	}
 	if (rs->burst && rs->burst > rs->printed) {
 		rs->printed++;

From 750afe7babd117daabebf4855da18e4418ea845e Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp@suse.de>
Date: Tue, 2 Aug 2016 14:04:07 -0700
Subject: [PATCH 042/111] printk: add kernel parameter to control writes to
 /dev/kmsg
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add a "printk.devkmsg" kernel command line parameter which controls how
userspace writes into /dev/kmsg.  It has three options:

 * ratelimit - ratelimit logging from userspace.
 * on  - unlimited logging from userspace
 * off - logging from userspace gets ignored

The default setting is to ratelimit the messages written to it.

This changes the kernel default setting of "on" to "ratelimit" and we do
that because we want to keep userspace spamming /dev/kmsg to sane
levels.  This is especially moot when a small kernel log buffer wraps
around and messages get lost.  So the ratelimiting setting should be a
sane setting where kernel messages should have a bit higher chance of
survival from all the spamming.

It additionally does not limit logging to /dev/kmsg while the system is
booting if we haven't disabled it on the command line.

Furthermore, we can control the logging from a lower priority sysctl
interface - kernel.printk_devkmsg.

That interface will succeed only if printk.devkmsg *hasn't* been
supplied on the command line.  If it has, then printk.devkmsg is a
one-time setting which remains for the duration of the system lifetime.
This "locking" of the setting is to prevent userspace from changing the
logging on us through sysctl(2).

This patch is based on previous patches from Linus and Steven.

[bp@suse.de: fixes]
  Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Franck Bui <fbui@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/kernel-parameters.txt |   7 ++
 Documentation/sysctl/kernel.txt     |  14 +++
 include/linux/printk.h              |   9 ++
 kernel/printk/printk.c              | 142 ++++++++++++++++++++++++++--
 kernel/sysctl.c                     |   7 ++
 5 files changed, 171 insertions(+), 8 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e24aa11e8f8a..b240540e49f2 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3173,6 +3173,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 			default: disabled
 
+	printk.devkmsg={on,off,ratelimit}
+			Control writing to /dev/kmsg.
+			on - unlimited logging to /dev/kmsg from userspace
+			off - logging to /dev/kmsg disabled
+			ratelimit - ratelimit the logging
+			Default: ratelimit
+
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 33204604de6c..ffab8b5caa60 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -764,6 +764,20 @@ send before ratelimiting kicks in.
 
 ==============================================================
 
+printk_devkmsg:
+
+Control the logging to /dev/kmsg from userspace:
+
+ratelimit: default, ratelimited
+on: unlimited logging to /dev/kmsg from userspace
+off: logging to /dev/kmsg disabled
+
+The kernel command line parameter printk.devkmsg= overrides this and is
+a one-time setting until next reboot: once set, it cannot be changed by
+this sysctl interface anymore.
+
+==============================================================
+
 randomize_va_space:
 
 This option can be used to select the type of process address
diff --git a/include/linux/printk.h b/include/linux/printk.h
index c2158f0f1499..8dc155dab3ed 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -61,6 +61,11 @@ static inline void console_verbose(void)
 		console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;
 }
 
+/* strlen("ratelimit") + 1 */
+#define DEVKMSG_STR_MAX_SIZE 10
+extern char devkmsg_log_str[];
+struct ctl_table;
+
 struct va_format {
 	const char *fmt;
 	va_list *va;
@@ -175,6 +180,10 @@ extern int printk_delay_msec;
 extern int dmesg_restrict;
 extern int kptr_restrict;
 
+extern int
+devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void __user *buf,
+			  size_t *lenp, loff_t *ppos);
+
 extern void wake_up_klogd(void);
 
 char *log_buf_addr_get(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 70c66c5ba212..a5ef95ca18c9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -85,6 +85,111 @@ static struct lockdep_map console_lock_dep_map = {
 };
 #endif
 
+enum devkmsg_log_bits {
+	__DEVKMSG_LOG_BIT_ON = 0,
+	__DEVKMSG_LOG_BIT_OFF,
+	__DEVKMSG_LOG_BIT_LOCK,
+};
+
+enum devkmsg_log_masks {
+	DEVKMSG_LOG_MASK_ON             = BIT(__DEVKMSG_LOG_BIT_ON),
+	DEVKMSG_LOG_MASK_OFF            = BIT(__DEVKMSG_LOG_BIT_OFF),
+	DEVKMSG_LOG_MASK_LOCK           = BIT(__DEVKMSG_LOG_BIT_LOCK),
+};
+
+/* Keep both the 'on' and 'off' bits clear, i.e. ratelimit by default: */
+#define DEVKMSG_LOG_MASK_DEFAULT	0
+
+static unsigned int __read_mostly devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
+
+static int __control_devkmsg(char *str)
+{
+	if (!str)
+		return -EINVAL;
+
+	if (!strncmp(str, "on", 2)) {
+		devkmsg_log = DEVKMSG_LOG_MASK_ON;
+		return 2;
+	} else if (!strncmp(str, "off", 3)) {
+		devkmsg_log = DEVKMSG_LOG_MASK_OFF;
+		return 3;
+	} else if (!strncmp(str, "ratelimit", 9)) {
+		devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
+		return 9;
+	}
+	return -EINVAL;
+}
+
+static int __init control_devkmsg(char *str)
+{
+	if (__control_devkmsg(str) < 0)
+		return 1;
+
+	/*
+	 * Set sysctl string accordingly:
+	 */
+	if (devkmsg_log == DEVKMSG_LOG_MASK_ON) {
+		memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE);
+		strncpy(devkmsg_log_str, "on", 2);
+	} else if (devkmsg_log == DEVKMSG_LOG_MASK_OFF) {
+		memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE);
+		strncpy(devkmsg_log_str, "off", 3);
+	}
+	/* else "ratelimit" which is set by default. */
+
+	/*
+	 * Sysctl cannot change it anymore. The kernel command line setting of
+	 * this parameter is to force the setting to be permanent throughout the
+	 * runtime of the system. This is a precation measure against userspace
+	 * trying to be a smarta** and attempting to change it up on us.
+	 */
+	devkmsg_log |= DEVKMSG_LOG_MASK_LOCK;
+
+	return 0;
+}
+__setup("printk.devkmsg=", control_devkmsg);
+
+char devkmsg_log_str[DEVKMSG_STR_MAX_SIZE] = "ratelimit";
+
+int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
+			      void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	char old_str[DEVKMSG_STR_MAX_SIZE];
+	unsigned int old;
+	int err;
+
+	if (write) {
+		if (devkmsg_log & DEVKMSG_LOG_MASK_LOCK)
+			return -EINVAL;
+
+		old = devkmsg_log;
+		strncpy(old_str, devkmsg_log_str, DEVKMSG_STR_MAX_SIZE);
+	}
+
+	err = proc_dostring(table, write, buffer, lenp, ppos);
+	if (err)
+		return err;
+
+	if (write) {
+		err = __control_devkmsg(devkmsg_log_str);
+
+		/*
+		 * Do not accept an unknown string OR a known string with
+		 * trailing crap...
+		 */
+		if (err < 0 || (err + 1 != *lenp)) {
+
+			/* ... and restore old setting. */
+			devkmsg_log = old;
+			strncpy(devkmsg_log_str, old_str, DEVKMSG_STR_MAX_SIZE);
+
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 /*
  * Number of registered extended console drivers.
  *
@@ -613,6 +718,7 @@ struct devkmsg_user {
 	u64 seq;
 	u32 idx;
 	enum log_flags prev;
+	struct ratelimit_state rs;
 	struct mutex lock;
 	char buf[CONSOLE_EXT_LOG_MAX];
 };
@@ -622,11 +728,24 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
 	char *buf, *line;
 	int level = default_message_loglevel;
 	int facility = 1;	/* LOG_USER */
+	struct file *file = iocb->ki_filp;
+	struct devkmsg_user *user = file->private_data;
 	size_t len = iov_iter_count(from);
 	ssize_t ret = len;
 
-	if (len > LOG_LINE_MAX)
+	if (!user || len > LOG_LINE_MAX)
 		return -EINVAL;
+
+	/* Ignore when user logging is disabled. */
+	if (devkmsg_log & DEVKMSG_LOG_MASK_OFF)
+		return len;
+
+	/* Ratelimit when not explicitly enabled. */
+	if (!(devkmsg_log & DEVKMSG_LOG_MASK_ON)) {
+		if (!___ratelimit(&user->rs, current->comm))
+			return ret;
+	}
+
 	buf = kmalloc(len+1, GFP_KERNEL);
 	if (buf == NULL)
 		return -ENOMEM;
@@ -799,19 +918,24 @@ static int devkmsg_open(struct inode *inode, struct file *file)
 	struct devkmsg_user *user;
 	int err;
 
-	/* write-only does not need any file context */
-	if ((file->f_flags & O_ACCMODE) == O_WRONLY)
-		return 0;
+	if (devkmsg_log & DEVKMSG_LOG_MASK_OFF)
+		return -EPERM;
 
-	err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
-				       SYSLOG_FROM_READER);
-	if (err)
-		return err;
+	/* write-only does not need any file context */
+	if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
+		err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
+					       SYSLOG_FROM_READER);
+		if (err)
+			return err;
+	}
 
 	user = kmalloc(sizeof(struct devkmsg_user), GFP_KERNEL);
 	if (!user)
 		return -ENOMEM;
 
+	ratelimit_default_init(&user->rs);
+	ratelimit_set_flags(&user->rs, RATELIMIT_MSG_ON_RELEASE);
+
 	mutex_init(&user->lock);
 
 	raw_spin_lock_irq(&logbuf_lock);
@@ -830,6 +954,8 @@ static int devkmsg_release(struct inode *inode, struct file *file)
 	if (!user)
 		return 0;
 
+	ratelimit_state_exit(&user->rs);
+
 	mutex_destroy(&user->lock);
 	kfree(user);
 	return 0;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 53954631a4e1..b43d0b27c1fe 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -813,6 +813,13 @@ static struct ctl_table kern_table[] = {
 		.extra1		= &zero,
 		.extra2		= &ten_thousand,
 	},
+	{
+		.procname	= "printk_devkmsg",
+		.data		= devkmsg_log_str,
+		.maxlen		= DEVKMSG_STR_MAX_SIZE,
+		.mode		= 0644,
+		.proc_handler	= devkmsg_sysctl_set_loglvl,
+	},
 	{
 		.procname	= "dmesg_restrict",
 		.data		= &dmesg_restrict,

From 4cad35a7ca690eabf0d241062ce9e59693ec03e7 Mon Sep 17 00:00:00 2001
From: Joe Perches <joe@perches.com>
Date: Tue, 2 Aug 2016 14:04:10 -0700
Subject: [PATCH 043/111] get_maintainer.pl: reduce need for command-line
 option -f

If a vcs is used, look to see if the vcs tracks the file specified and
so the -f option becomes optional.

Link: http://lkml.kernel.org/r/7c86a8df0d48770c45778a43b6b3e4627b2a90ee.1469746395.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 scripts/get_maintainer.pl | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
index 1873421f2305..122fcdaf42c8 100755
--- a/scripts/get_maintainer.pl
+++ b/scripts/get_maintainer.pl
@@ -133,6 +133,7 @@ my %VCS_cmds_git = (
     "author_pattern" => "^GitAuthor: (.*)",
     "subject_pattern" => "^GitSubject: (.*)",
     "stat_pattern" => "^(\\d+)\\t(\\d+)\\t\$file\$",
+    "file_exists_cmd" => "git ls-files \$file",
 );
 
 my %VCS_cmds_hg = (
@@ -161,6 +162,7 @@ my %VCS_cmds_hg = (
     "author_pattern" => "^HgAuthor: (.*)",
     "subject_pattern" => "^HgSubject: (.*)",
     "stat_pattern" => "^(\\d+)\t(\\d+)\t\$file\$",
+    "file_exists_cmd" => "hg files \$file",
 );
 
 my $conf = which_conf(".get_maintainer.conf");
@@ -430,7 +432,7 @@ foreach my $file (@ARGV) {
 	    die "$P: file '${file}' not found\n";
 	}
     }
-    if ($from_filename) {
+    if ($from_filename || vcs_file_exists($file)) {
 	$file =~ s/^\Q${cur_path}\E//;	#strip any absolute path
 	$file =~ s/^\Q${lk_path}\E//;	#or the path to the lk tree
 	push(@files, $file);
@@ -2124,6 +2126,22 @@ sub vcs_file_blame {
     }
 }
 
+sub vcs_file_exists {
+    my ($file) = @_;
+
+    my $exists;
+
+    my $vcs_used = vcs_exists();
+    return 0 if (!$vcs_used);
+
+    my $cmd = $VCS_cmds{"file_exists_cmd"};
+    $cmd =~ s/(\$\w+)/$1/eeg;		# interpolate $cmd
+
+    $exists = &{$VCS_cmds{"execute_cmd"}}($cmd);
+
+    return $exists;
+}
+
 sub uniq {
     my (@parms) = @_;
 

From f003a1f182bb821f13775338a4bf8711830f927a Mon Sep 17 00:00:00 2001
From: Sebastian Ott <sebott@linux.vnet.ibm.com>
Date: Tue, 2 Aug 2016 14:04:13 -0700
Subject: [PATCH 044/111] lib/iommu-helper: skip to next segment

When a large enough area in the iommu bitmap is found but would span a
boundary we continue the search starting from the next bit position.
For large allocations this can lead to several useless invocations of
bitmap_find_next_zero_area() and iommu_is_span_boundary().

Continue the search from the start of the next segment (which is the
next bit position such that we'll not cross the same segment boundary
again).

Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1606081910070.3211@schleppi
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 lib/iommu-helper.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/iommu-helper.c b/lib/iommu-helper.c
index c27e269210c4..a816f3a80625 100644
--- a/lib/iommu-helper.c
+++ b/lib/iommu-helper.c
@@ -29,8 +29,7 @@ again:
 	index = bitmap_find_next_zero_area(map, size, start, nr, align_mask);
 	if (index < size) {
 		if (iommu_is_span_boundary(index, nr, shift, boundary_size)) {
-			/* we could do more effectively */
-			start = index + 1;
+			start = ALIGN(shift + index, boundary_size) - shift;
 			goto again;
 		}
 		bitmap_set(map, index, nr);

From a9bfd3321713ecec86282dd2bec04212189f91f1 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd@arndb.de>
Date: Tue, 2 Aug 2016 14:04:16 -0700
Subject: [PATCH 045/111] crc32: use ktime_get_ns() for measurement

The crc32 test function measures the elapsed time in nanoseconds, but
uses 'struct timespec' for that.  We want to remove timespec from the
kernel for y2038 compatibility, and ktime_get_ns() also helps make the
code simpler here.

It is also slightly better to use monontonic time, as we are only
interested in the time difference.

Link: http://lkml.kernel.org/r/20160617143932.3289626-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: "David S . Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 lib/crc32.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/lib/crc32.c b/lib/crc32.c
index 9a907d489d95..7fbd1a112b9d 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -979,7 +979,6 @@ static int __init crc32c_test(void)
 	int i;
 	int errors = 0;
 	int bytes = 0;
-	struct timespec start, stop;
 	u64 nsec;
 	unsigned long flags;
 
@@ -999,20 +998,17 @@ static int __init crc32c_test(void)
 	local_irq_save(flags);
 	local_irq_disable();
 
-	getnstimeofday(&start);
+	nsec = ktime_get_ns();
 	for (i = 0; i < 100; i++) {
 		if (test[i].crc32c_le != __crc32c_le(test[i].crc, test_buf +
 		    test[i].start, test[i].length))
 			errors++;
 	}
-	getnstimeofday(&stop);
+	nsec = ktime_get_ns() - nsec;
 
 	local_irq_restore(flags);
 	local_irq_enable();
 
-	nsec = stop.tv_nsec - start.tv_nsec +
-		1000000000 * (stop.tv_sec - start.tv_sec);
-
 	pr_info("crc32c: CRC_LE_BITS = %d\n", CRC_LE_BITS);
 
 	if (errors)
@@ -1065,7 +1061,6 @@ static int __init crc32_test(void)
 	int i;
 	int errors = 0;
 	int bytes = 0;
-	struct timespec start, stop;
 	u64 nsec;
 	unsigned long flags;
 
@@ -1088,7 +1083,7 @@ static int __init crc32_test(void)
 	local_irq_save(flags);
 	local_irq_disable();
 
-	getnstimeofday(&start);
+	nsec = ktime_get_ns();
 	for (i = 0; i < 100; i++) {
 		if (test[i].crc_le != crc32_le(test[i].crc, test_buf +
 		    test[i].start, test[i].length))
@@ -1098,14 +1093,11 @@ static int __init crc32_test(void)
 		    test[i].start, test[i].length))
 			errors++;
 	}
-	getnstimeofday(&stop);
+	nsec = ktime_get_ns() - nsec;
 
 	local_irq_restore(flags);
 	local_irq_enable();
 
-	nsec = stop.tv_nsec - start.tv_nsec +
-		1000000000 * (stop.tv_sec - start.tv_sec);
-
 	pr_info("crc32: CRC_LE_BITS = %d, CRC_BE BITS = %d\n",
 		 CRC_LE_BITS, CRC_BE_BITS);
 

From a23216a2f1f8a30a3b6588c743681651e4a6aa94 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <ross.zwisler@linux.intel.com>
Date: Tue, 2 Aug 2016 14:04:19 -0700
Subject: [PATCH 046/111] radix-tree: fix comment about "exceptional" bits

The bottom two bits of radix tree entries are reserved for special use
by the radix tree code itself.  A comment detailing their usage was
added by commit 3bcadd6fa6c4 ("radix-tree: free up the bottom bit of
exceptional entries for reuse")

This comment states that if the bottom two bits are '11', this means
that this is a locked exceptional entry.

It turns out that this bit combination was never actually used.  Radix
tree locking for DAX was indeed implemented, but it actually used the
third LSB:

  /* We use lowest available exceptional entry bit for locking */
  #define RADIX_DAX_ENTRY_LOCK (1 << RADIX_TREE_EXCEPTIONAL_SHIFT)

This locking code was also made specific to the DAX code instead of
being generally implemented in radix-tree.h.

So, fix the comment.

Link: http://lkml.kernel.org/r/1468997731-2155-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/radix-tree.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index cbfee507c839..4c45105dece3 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -35,7 +35,7 @@
  * 00 - data pointer
  * 01 - internal entry
  * 10 - exceptional entry
- * 11 - locked exceptional entry
+ * 11 - this bit combination is currently unused/reserved
  *
  * The internal entry may be a pointer to the next level in the tree, a
  * sibling entry, or an indicator that the entry in this slot has been moved

From 9ccf98119821defe66ee2ee21f8a11071f63fa65 Mon Sep 17 00:00:00 2001
From: Stephen Boyd <stephen.boyd@linaro.org>
Date: Tue, 2 Aug 2016 14:04:22 -0700
Subject: [PATCH 047/111] firmware: consolidate kmap/read/write logic

Some systems are memory constrained but they need to load very large
firmwares.  The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver.  This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.

This design creates needless memory pressure and delays loading because
we have to copy from kernel memory to somewhere else.  This patch sets
adds support to the request firmware API to load the firmware directly
into a pre-allocated buffer, skipping the intermediate copying step and
alleviating memory pressure during firmware loading.  The drawback is
that we can't use the firmware caching feature because the memory for
the firmware cache is not managed by the firmware layer.

This patch (of 3):

We use similar structured code to read and write the kmapped firmware
pages.  The only difference is read copies from the kmap region and
write copies to it.  Consolidate this into one function to reduce
duplication.

Link: http://lkml.kernel.org/r/20160607164741.31849-2-stephen.boyd@linaro.org
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/base/firmware_class.c | 57 ++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 31 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 773fc3099769..01d55723d82c 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -691,6 +691,29 @@ out:
 
 static DEVICE_ATTR(loading, 0644, firmware_loading_show, firmware_loading_store);
 
+static void firmware_rw(struct firmware_buf *buf, char *buffer,
+			loff_t offset, size_t count, bool read)
+{
+	while (count) {
+		void *page_data;
+		int page_nr = offset >> PAGE_SHIFT;
+		int page_ofs = offset & (PAGE_SIZE-1);
+		int page_cnt = min_t(size_t, PAGE_SIZE - page_ofs, count);
+
+		page_data = kmap(buf->pages[page_nr]);
+
+		if (read)
+			memcpy(buffer, page_data + page_ofs, page_cnt);
+		else
+			memcpy(page_data + page_ofs, buffer, page_cnt);
+
+		kunmap(buf->pages[page_nr]);
+		buffer += page_cnt;
+		offset += page_cnt;
+		count -= page_cnt;
+	}
+}
+
 static ssize_t firmware_data_read(struct file *filp, struct kobject *kobj,
 				  struct bin_attribute *bin_attr,
 				  char *buffer, loff_t offset, size_t count)
@@ -715,21 +738,8 @@ static ssize_t firmware_data_read(struct file *filp, struct kobject *kobj,
 
 	ret_count = count;
 
-	while (count) {
-		void *page_data;
-		int page_nr = offset >> PAGE_SHIFT;
-		int page_ofs = offset & (PAGE_SIZE-1);
-		int page_cnt = min_t(size_t, PAGE_SIZE - page_ofs, count);
+	firmware_rw(buf, buffer, offset, count, true);
 
-		page_data = kmap(buf->pages[page_nr]);
-
-		memcpy(buffer, page_data + page_ofs, page_cnt);
-
-		kunmap(buf->pages[page_nr]);
-		buffer += page_cnt;
-		offset += page_cnt;
-		count -= page_cnt;
-	}
 out:
 	mutex_unlock(&fw_lock);
 	return ret_count;
@@ -809,24 +819,9 @@ static ssize_t firmware_data_write(struct file *filp, struct kobject *kobj,
 		goto out;
 
 	retval = count;
+	firmware_rw(buf, buffer, offset, count, false);
 
-	while (count) {
-		void *page_data;
-		int page_nr = offset >> PAGE_SHIFT;
-		int page_ofs = offset & (PAGE_SIZE - 1);
-		int page_cnt = min_t(size_t, PAGE_SIZE - page_ofs, count);
-
-		page_data = kmap(buf->pages[page_nr]);
-
-		memcpy(page_data + page_ofs, buffer, page_cnt);
-
-		kunmap(buf->pages[page_nr]);
-		buffer += page_cnt;
-		offset += page_cnt;
-		count -= page_cnt;
-	}
-
-	buf->size = max_t(size_t, offset, buf->size);
+	buf->size = max_t(size_t, offset + count, buf->size);
 out:
 	mutex_unlock(&fw_lock);
 	return retval;

From 0e742e927571946e08e877d3629e6efd4891ed95 Mon Sep 17 00:00:00 2001
From: Vikram Mulukutla <markivx@codeaurora.org>
Date: Tue, 2 Aug 2016 14:04:25 -0700
Subject: [PATCH 048/111] firmware: provide infrastructure to make fw caching
 optional

Some low memory systems with complex peripherals cannot afford to have
the relatively large firmware images taking up valuable memory during
suspend and resume.  Change the internal implementation of
firmware_class to disallow caching based on a configurable option.  In
the near future, variants of request_firmware will take advantage of
this feature.

Link: http://lkml.kernel.org/r/20160607164741.31849-3-stephen.boyd@linaro.org
[stephen.boyd@linaro.org: Drop firmware_desc design and use flags]
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/base/firmware_class.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 01d55723d82c..45ed20cefa10 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -112,6 +112,7 @@ static inline long firmware_loading_timeout(void)
 #define FW_OPT_FALLBACK		0
 #endif
 #define FW_OPT_NO_WARN	(1U << 3)
+#define FW_OPT_NOCACHE	(1U << 4)
 
 struct firmware_cache {
 	/* firmware_buf instance will be added into the below list */
@@ -1065,14 +1066,16 @@ static int assign_firmware_buf(struct firmware *fw, struct device *device,
 	 * should be fixed in devres or driver core.
 	 */
 	/* don't cache firmware handled without uevent */
-	if (device && (opt_flags & FW_OPT_UEVENT))
+	if (device && (opt_flags & FW_OPT_UEVENT) &&
+	    !(opt_flags & FW_OPT_NOCACHE))
 		fw_add_devm_name(device, buf->fw_id);
 
 	/*
 	 * After caching firmware image is started, let it piggyback
 	 * on request firmware.
 	 */
-	if (buf->fwc->state == FW_LOADER_START_CACHE) {
+	if (!(opt_flags & FW_OPT_NOCACHE) &&
+	    buf->fwc->state == FW_LOADER_START_CACHE) {
 		if (fw_cache_piggyback_on_request(buf->fw_id))
 			kref_get(&buf->ref);
 	}

From a098ecd2fa7db8fa4fcc178a43627b29b226edb9 Mon Sep 17 00:00:00 2001
From: Stephen Boyd <stephen.boyd@linaro.org>
Date: Tue, 2 Aug 2016 14:04:28 -0700
Subject: [PATCH 049/111] firmware: support loading into a pre-allocated buffer

Some systems are memory constrained but they need to load very large
firmwares.  The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver.  This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.

This creates needless memory pressure and delays loading because we have
to copy from kernel memory to somewhere else.  Let's add a
request_firmware_into_buf() API that allows drivers to request firmware
be loaded directly into a pre-allocated buffer.  This skips the
intermediate step of allocating a buffer in kernel memory to hold the
firmware image while it's read from the filesystem.  It also requires
that drivers know how much memory they'll require before requesting the
firmware and negates any benefits of firmware caching because the
firmware layer doesn't manage the buffer lifetime.

For a 16MB buffer, about half the time is spent performing a memcpy from
the buffer to the final resting place.  I see loading times go from
0.081171 seconds to 0.047696 seconds after applying this patch.  Plus
the vmalloc pressure is reduced.

This is based on a patch from Vikram Mulukutla on codeaurora.org:
  https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/drivers/base/firmware_class.c?h=rel/msm-3.18&id=0a328c5f6cd999f5c591f172216835636f39bcb5

Link: http://lkml.kernel.org/r/20160607164741.31849-4-stephen.boyd@linaro.org
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/base/firmware_class.c | 125 +++++++++++++++++++++++++++-------
 fs/exec.c                     |   9 ++-
 include/linux/firmware.h      |   8 +++
 include/linux/fs.h            |   1 +
 4 files changed, 114 insertions(+), 29 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 45ed20cefa10..22d1760a4278 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -46,7 +46,8 @@ MODULE_LICENSE("GPL");
 extern struct builtin_fw __start_builtin_fw[];
 extern struct builtin_fw __end_builtin_fw[];
 
-static bool fw_get_builtin_firmware(struct firmware *fw, const char *name)
+static bool fw_get_builtin_firmware(struct firmware *fw, const char *name,
+				    void *buf, size_t size)
 {
 	struct builtin_fw *b_fw;
 
@@ -54,6 +55,9 @@ static bool fw_get_builtin_firmware(struct firmware *fw, const char *name)
 		if (strcmp(name, b_fw->name) == 0) {
 			fw->size = b_fw->size;
 			fw->data = b_fw->data;
+
+			if (buf && fw->size <= size)
+				memcpy(buf, fw->data, fw->size);
 			return true;
 		}
 	}
@@ -74,7 +78,9 @@ static bool fw_is_builtin_firmware(const struct firmware *fw)
 
 #else /* Module case - no builtin firmware support */
 
-static inline bool fw_get_builtin_firmware(struct firmware *fw, const char *name)
+static inline bool fw_get_builtin_firmware(struct firmware *fw,
+					   const char *name, void *buf,
+					   size_t size)
 {
 	return false;
 }
@@ -144,6 +150,7 @@ struct firmware_buf {
 	unsigned long status;
 	void *data;
 	size_t size;
+	size_t allocated_size;
 #ifdef CONFIG_FW_LOADER_USER_HELPER
 	bool is_paged_buf;
 	bool need_uevent;
@@ -179,7 +186,8 @@ static DEFINE_MUTEX(fw_lock);
 static struct firmware_cache fw_cache;
 
 static struct firmware_buf *__allocate_fw_buf(const char *fw_name,
-					      struct firmware_cache *fwc)
+					      struct firmware_cache *fwc,
+					      void *dbuf, size_t size)
 {
 	struct firmware_buf *buf;
 
@@ -195,6 +203,8 @@ static struct firmware_buf *__allocate_fw_buf(const char *fw_name,
 
 	kref_init(&buf->ref);
 	buf->fwc = fwc;
+	buf->data = dbuf;
+	buf->allocated_size = size;
 	init_completion(&buf->completion);
 #ifdef CONFIG_FW_LOADER_USER_HELPER
 	INIT_LIST_HEAD(&buf->pending_list);
@@ -218,7 +228,8 @@ static struct firmware_buf *__fw_lookup_buf(const char *fw_name)
 
 static int fw_lookup_and_allocate_buf(const char *fw_name,
 				      struct firmware_cache *fwc,
-				      struct firmware_buf **buf)
+				      struct firmware_buf **buf, void *dbuf,
+				      size_t size)
 {
 	struct firmware_buf *tmp;
 
@@ -230,7 +241,7 @@ static int fw_lookup_and_allocate_buf(const char *fw_name,
 		*buf = tmp;
 		return 1;
 	}
-	tmp = __allocate_fw_buf(fw_name, fwc);
+	tmp = __allocate_fw_buf(fw_name, fwc, dbuf, size);
 	if (tmp)
 		list_add(&tmp->list, &fwc->head);
 	spin_unlock(&fwc->lock);
@@ -262,6 +273,7 @@ static void __fw_free_buf(struct kref *ref)
 		vfree(buf->pages);
 	} else
 #endif
+	if (!buf->allocated_size)
 		vfree(buf->data);
 	kfree_const(buf->fw_id);
 	kfree(buf);
@@ -302,13 +314,21 @@ static void fw_finish_direct_load(struct device *device,
 	mutex_unlock(&fw_lock);
 }
 
-static int fw_get_filesystem_firmware(struct device *device,
-				       struct firmware_buf *buf)
+static int
+fw_get_filesystem_firmware(struct device *device, struct firmware_buf *buf)
 {
 	loff_t size;
 	int i, len;
 	int rc = -ENOENT;
 	char *path;
+	enum kernel_read_file_id id = READING_FIRMWARE;
+	size_t msize = INT_MAX;
+
+	/* Already populated data member means we're loading into a buffer */
+	if (buf->data) {
+		id = READING_FIRMWARE_PREALLOC_BUFFER;
+		msize = buf->allocated_size;
+	}
 
 	path = __getname();
 	if (!path)
@@ -327,8 +347,8 @@ static int fw_get_filesystem_firmware(struct device *device,
 		}
 
 		buf->size = 0;
-		rc = kernel_read_file_from_path(path, &buf->data, &size,
-						INT_MAX, READING_FIRMWARE);
+		rc = kernel_read_file_from_path(path, &buf->data, &size, msize,
+						id);
 		if (rc) {
 			if (rc == -ENOENT)
 				dev_dbg(device, "loading %s failed with error %d\n",
@@ -692,6 +712,15 @@ out:
 
 static DEVICE_ATTR(loading, 0644, firmware_loading_show, firmware_loading_store);
 
+static void firmware_rw_buf(struct firmware_buf *buf, char *buffer,
+			   loff_t offset, size_t count, bool read)
+{
+	if (read)
+		memcpy(buffer, buf->data + offset, count);
+	else
+		memcpy(buf->data + offset, buffer, count);
+}
+
 static void firmware_rw(struct firmware_buf *buf, char *buffer,
 			loff_t offset, size_t count, bool read)
 {
@@ -739,7 +768,10 @@ static ssize_t firmware_data_read(struct file *filp, struct kobject *kobj,
 
 	ret_count = count;
 
-	firmware_rw(buf, buffer, offset, count, true);
+	if (buf->data)
+		firmware_rw_buf(buf, buffer, offset, count, true);
+	else
+		firmware_rw(buf, buffer, offset, count, true);
 
 out:
 	mutex_unlock(&fw_lock);
@@ -815,12 +847,21 @@ static ssize_t firmware_data_write(struct file *filp, struct kobject *kobj,
 		goto out;
 	}
 
-	retval = fw_realloc_buffer(fw_priv, offset + count);
-	if (retval)
-		goto out;
+	if (buf->data) {
+		if (offset + count > buf->allocated_size) {
+			retval = -ENOMEM;
+			goto out;
+		}
+		firmware_rw_buf(buf, buffer, offset, count, false);
+		retval = count;
+	} else {
+		retval = fw_realloc_buffer(fw_priv, offset + count);
+		if (retval)
+			goto out;
 
-	retval = count;
-	firmware_rw(buf, buffer, offset, count, false);
+		retval = count;
+		firmware_rw(buf, buffer, offset, count, false);
+	}
 
 	buf->size = max_t(size_t, offset + count, buf->size);
 out:
@@ -890,7 +931,8 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
 	struct firmware_buf *buf = fw_priv->buf;
 
 	/* fall back on userspace loading */
-	buf->is_paged_buf = true;
+	if (!buf->data)
+		buf->is_paged_buf = true;
 
 	dev_set_uevent_suppress(f_dev, true);
 
@@ -925,7 +967,7 @@ static int _request_firmware_load(struct firmware_priv *fw_priv,
 
 	if (is_fw_load_aborted(buf))
 		retval = -EAGAIN;
-	else if (!buf->data)
+	else if (buf->is_paged_buf && !buf->data)
 		retval = -ENOMEM;
 
 	device_del(f_dev);
@@ -1008,7 +1050,7 @@ static int sync_cached_firmware_buf(struct firmware_buf *buf)
  */
 static int
 _request_firmware_prepare(struct firmware **firmware_p, const char *name,
-			  struct device *device)
+			  struct device *device, void *dbuf, size_t size)
 {
 	struct firmware *firmware;
 	struct firmware_buf *buf;
@@ -1021,12 +1063,12 @@ _request_firmware_prepare(struct firmware **firmware_p, const char *name,
 		return -ENOMEM;
 	}
 
-	if (fw_get_builtin_firmware(firmware, name)) {
+	if (fw_get_builtin_firmware(firmware, name, dbuf, size)) {
 		dev_dbg(device, "using built-in %s\n", name);
 		return 0; /* assigned */
 	}
 
-	ret = fw_lookup_and_allocate_buf(name, &fw_cache, &buf);
+	ret = fw_lookup_and_allocate_buf(name, &fw_cache, &buf, dbuf, size);
 
 	/*
 	 * bind with 'buf' now to avoid warning in failure path
@@ -1089,7 +1131,8 @@ static int assign_firmware_buf(struct firmware *fw, struct device *device,
 /* called from request_firmware() and request_firmware_work_func() */
 static int
 _request_firmware(const struct firmware **firmware_p, const char *name,
-		  struct device *device, unsigned int opt_flags)
+		  struct device *device, void *buf, size_t size,
+		  unsigned int opt_flags)
 {
 	struct firmware *fw = NULL;
 	long timeout;
@@ -1103,7 +1146,7 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
 		goto out;
 	}
 
-	ret = _request_firmware_prepare(&fw, name, device);
+	ret = _request_firmware_prepare(&fw, name, device, buf, size);
 	if (ret <= 0) /* error or already assigned */
 		goto out;
 
@@ -1182,7 +1225,7 @@ request_firmware(const struct firmware **firmware_p, const char *name,
 
 	/* Need to pin this module until return */
 	__module_get(THIS_MODULE);
-	ret = _request_firmware(firmware_p, name, device,
+	ret = _request_firmware(firmware_p, name, device, NULL, 0,
 				FW_OPT_UEVENT | FW_OPT_FALLBACK);
 	module_put(THIS_MODULE);
 	return ret;
@@ -1206,13 +1249,43 @@ int request_firmware_direct(const struct firmware **firmware_p,
 	int ret;
 
 	__module_get(THIS_MODULE);
-	ret = _request_firmware(firmware_p, name, device,
+	ret = _request_firmware(firmware_p, name, device, NULL, 0,
 				FW_OPT_UEVENT | FW_OPT_NO_WARN);
 	module_put(THIS_MODULE);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(request_firmware_direct);
 
+/**
+ * request_firmware_into_buf - load firmware into a previously allocated buffer
+ * @firmware_p: pointer to firmware image
+ * @name: name of firmware file
+ * @device: device for which firmware is being loaded and DMA region allocated
+ * @buf: address of buffer to load firmware into
+ * @size: size of buffer
+ *
+ * This function works pretty much like request_firmware(), but it doesn't
+ * allocate a buffer to hold the firmware data. Instead, the firmware
+ * is loaded directly into the buffer pointed to by @buf and the @firmware_p
+ * data member is pointed at @buf.
+ *
+ * This function doesn't cache firmware either.
+ */
+int
+request_firmware_into_buf(const struct firmware **firmware_p, const char *name,
+			  struct device *device, void *buf, size_t size)
+{
+	int ret;
+
+	__module_get(THIS_MODULE);
+	ret = _request_firmware(firmware_p, name, device, buf, size,
+				FW_OPT_UEVENT | FW_OPT_FALLBACK |
+				FW_OPT_NOCACHE);
+	module_put(THIS_MODULE);
+	return ret;
+}
+EXPORT_SYMBOL(request_firmware_into_buf);
+
 /**
  * release_firmware: - release the resource associated with a firmware image
  * @fw: firmware resource to release
@@ -1245,7 +1318,7 @@ static void request_firmware_work_func(struct work_struct *work)
 
 	fw_work = container_of(work, struct firmware_work, work);
 
-	_request_firmware(&fw, fw_work->name, fw_work->device,
+	_request_firmware(&fw, fw_work->name, fw_work->device, NULL, 0,
 			  fw_work->opt_flags);
 	fw_work->cont(fw, fw_work->context);
 	put_device(fw_work->device); /* taken in request_firmware_nowait() */
@@ -1378,7 +1451,7 @@ static int uncache_firmware(const char *fw_name)
 
 	pr_debug("%s: %s\n", __func__, fw_name);
 
-	if (fw_get_builtin_firmware(&fw, fw_name))
+	if (fw_get_builtin_firmware(&fw, fw_name, NULL, 0))
 		return 0;
 
 	buf = fw_lookup_buf(fw_name);
diff --git a/fs/exec.c b/fs/exec.c
index ca239fc86d8d..a1789cd684bf 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -866,7 +866,8 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
 		goto out;
 	}
 
-	*buf = vmalloc(i_size);
+	if (id != READING_FIRMWARE_PREALLOC_BUFFER)
+		*buf = vmalloc(i_size);
 	if (!*buf) {
 		ret = -ENOMEM;
 		goto out;
@@ -897,8 +898,10 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
 
 out_free:
 	if (ret < 0) {
-		vfree(*buf);
-		*buf = NULL;
+		if (id != READING_FIRMWARE_PREALLOC_BUFFER) {
+			vfree(*buf);
+			*buf = NULL;
+		}
 	}
 
 out:
diff --git a/include/linux/firmware.h b/include/linux/firmware.h
index 5c41c5e75b5c..b1f9f0ccb8ac 100644
--- a/include/linux/firmware.h
+++ b/include/linux/firmware.h
@@ -47,6 +47,8 @@ int request_firmware_nowait(
 	void (*cont)(const struct firmware *fw, void *context));
 int request_firmware_direct(const struct firmware **fw, const char *name,
 			    struct device *device);
+int request_firmware_into_buf(const struct firmware **firmware_p,
+	const char *name, struct device *device, void *buf, size_t size);
 
 void release_firmware(const struct firmware *fw);
 #else
@@ -75,5 +77,11 @@ static inline int request_firmware_direct(const struct firmware **fw,
 	return -EINVAL;
 }
 
+static inline int request_firmware_into_buf(const struct firmware **firmware_p,
+	const char *name, struct device *device, void *buf, size_t size)
+{
+	return -EINVAL;
+}
+
 #endif
 #endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 577365a77b47..f3f0b4c8e8ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2652,6 +2652,7 @@ extern int do_pipe_flags(int *, int);
 #define __kernel_read_file_id(id) \
 	id(UNKNOWN, unknown)		\
 	id(FIRMWARE, firmware)		\
+	id(FIRMWARE_PREALLOC_BUFFER, firmware)	\
 	id(MODULE, kernel-module)		\
 	id(KEXEC_IMAGE, kexec-image)		\
 	id(KEXEC_INITRAMFS, kexec-initramfs)	\

From d560a5f8a46e98c2f83c5fe699e1d6f6393a14cf Mon Sep 17 00:00:00 2001
From: Joe Perches <joe@perches.com>
Date: Tue, 2 Aug 2016 14:04:31 -0700
Subject: [PATCH 050/111] checkpatch: skip long lines that use an EFI_GUID
 macro

These are also possible single line uses that exceed the generic maximum
line length (typically 80 columns)

Link: http://lkml.kernel.org/r/32a6a85fbd6161f1bb55ce176a464e44591afc5b.1468368420.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 scripts/checkpatch.pl | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 24a08363995a..a4476b61e93f 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2763,6 +2763,10 @@ sub process {
 				 $line =~ /^\+\s*#\s*define\s+\w+\s+$String$/) {
 				$msg_type = "";
 
+			# EFI_GUID is another special case
+			} elsif ($line =~ /^\+.*\bEFI_GUID\s*\(/) {
+				$msg_type = "";
+
 			# Otherwise set the alternate message types
 
 			# a comment starts before $max_line_length

From dadf680de3c2eb4cba9840619991eda0cfe98778 Mon Sep 17 00:00:00 2001
From: Joe Perches <joe@perches.com>
Date: Tue, 2 Aug 2016 14:04:33 -0700
Subject: [PATCH 051/111] checkpatch: allow c99 style // comments

Sanitise the lines that contain c99 comments so that the error doesn't
get emitted.

Link: http://lkml.kernel.org/r/d4d22c34ad7bcc1bceb52f0742f76b7a6d585235.1468368420.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 scripts/checkpatch.pl | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index a4476b61e93f..79273003d5e7 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -55,6 +55,7 @@ my $spelling_file = "$D/spelling.txt";
 my $codespell = 0;
 my $codespellfile = "/usr/share/codespell/dictionary.txt";
 my $color = 1;
+my $allow_c99_comments = 1;
 
 sub help {
 	my ($exitcode) = @_;
@@ -1144,6 +1145,11 @@ sub sanitise_line {
 		$res =~ s@(\#\s*(?:error|warning)\s+).*@$1$clean@;
 	}
 
+	if ($allow_c99_comments && $res =~ m@(//.*$)@) {
+		my $match = $1;
+		$res =~ s/\Q$match\E/"$;" x length($match)/e;
+	}
+
 	return $res;
 }
 

From aab38f516aa99e8132f906a526bf44fa59e9daa3 Mon Sep 17 00:00:00 2001
From: Joe Perches <joe@perches.com>
Date: Tue, 2 Aug 2016 14:04:36 -0700
Subject: [PATCH 052/111] checkpatch: yet another commit id improvement

Using \b isn't good enough to isolate what appears to be a commit id in
a commit message.

Make sure there is a space or a quote like character after a continuous
run of hexadecimal characters that could be a commit id.

Link: http://lkml.kernel.org/r/fdd22b47463a21c21132edbb8aa35e372950a1e6.1468869915.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 scripts/checkpatch.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 79273003d5e7..7a28775274a5 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2459,9 +2459,9 @@ sub process {
 
 # Check for git id commit length and improperly formed commit descriptions
 		if ($in_commit_log && !$commit_log_possible_stack_dump &&
-		    $line !~ /^\s*(?:Link|Patchwork|http|BugLink):/i &&
+		    $line !~ /^\s*(?:Link|Patchwork|http|https|BugLink):/i &&
 		    ($line =~ /\bcommit\s+[0-9a-f]{5,}\b/i ||
-		     ($line =~ /\b[0-9a-f]{12,40}\b/i &&
+		     ($line =~ /(?:\s|^)[0-9a-f]{12,40}(?:[\s"'\(\[]|$)/i &&
 		      $line !~ /[\<\[][0-9a-f]{12,40}[\>\]]/i &&
 		      $line !~ /\bfixes:\s*[0-9a-f]{12,40}/i))) {
 			my $init_char = "c";

From cec3aaa56638c7aad763630b9cbe591f2e791a3b Mon Sep 17 00:00:00 2001
From: Tomas Winkler <tomas.winkler@intel.com>
Date: Tue, 2 Aug 2016 14:04:39 -0700
Subject: [PATCH 053/111] checkpatch: don't complain about BIT macro in uapi

BIT macro cannot be exported to UAPI, don't complain about it.

Link: http://lkml.kernel.org/r/1468707033-16173-1-git-send-email-tomas.winkler@intel.com
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 scripts/checkpatch.pl | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 7a28775274a5..77915e095022 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5732,8 +5732,9 @@ sub process {
 			}
 		}
 
-# check for #defines like: 1 << <digit> that could be BIT(digit)
-		if ($line =~ /#\s*define\s+\w+\s+\(?\s*1\s*([ulUL]*)\s*\<\<\s*(?:\d+|$Ident)\s*\)?/) {
+# check for #defines like: 1 << <digit> that could be BIT(digit), it is not exported to uapi
+		if ($realfile !~ m@^include/uapi/@ &&
+		    $line =~ /#\s*define\s+\w+\s+\(?\s*1\s*([ulUL]*)\s*\<\<\s*(?:\d+|$Ident)\s*\)?/) {
 			my $ull = "";
 			$ull = "_ULL" if (defined($1) && $1 =~ /ll/i);
 			if (CHK("BIT_MACRO",

From c844711575086231890084390a275d06f11a623a Mon Sep 17 00:00:00 2001
From: Joe Perches <joe@perches.com>
Date: Tue, 2 Aug 2016 14:04:42 -0700
Subject: [PATCH 054/111] checkpatch: improve 'bare use of' signed/unsigned
 types warning

Fix false positive warning of identifiers ending in signed with an =
assignment of WARNING: Prefer 'signed int' to bare use of 'signed'.

Link: http://lkml.kernel.org/r/6a0e24c3e9102337528ecfcbbe91a0eb5b4820ed.1469529497.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reported-by: Alan Douglas <alanjhd@gmail.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 scripts/checkpatch.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 77915e095022..1d5b09dd577a 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3347,7 +3347,7 @@ sub process {
 		next if ($line =~ /^[^\+]/);
 
 # check for declarations of signed or unsigned without int
-		while ($line =~ m{($Declare)\s*(?!char\b|short\b|int\b|long\b)\s*($Ident)?\s*[=,;\[\)\(]}g) {
+		while ($line =~ m{\b($Declare)\s*(?!char\b|short\b|int\b|long\b)\s*($Ident)?\s*[=,;\[\)\(]}g) {
 			my $type = $1;
 			my $var = $2;
 			$var = "" if (!defined $var);

From ed43c4e58a6d3061e3329c41d7b880f11541245a Mon Sep 17 00:00:00 2001
From: Allen Hubbe <allenbh@gmail.com>
Date: Tue, 2 Aug 2016 14:04:45 -0700
Subject: [PATCH 055/111] checkpatch: check signoff when reading stdin

Signoff was not checked if the filename is '-', indicating reading the
patch from stdin.  Commands such as the below would not warn about a
missing signoff, because the patch filename is '-'.  This change allows
checkpatch to warn about a missing signoff, even if the input filename
is '-', but only if the patch has a commit message.

  git show --pretty=email | scripts/checkpatch.pl -

A more common use of checkpatch with stdin is for piping git diff
through checkpatch.  The diff output would not contain a commit message,
and therefore it would not contain a signoff line.  For this common use
case, a warning should not be printed about the missing signoff.  With
this change we will only warn about a missing signoff if the input
contains a commit message.

  git diff | scripts/checkpatch.pl -

Before this patch, a workaround for the first command was to refer to
stdin by a name other than '-'.  The workaround is not an elegant
solution, because elsewhere checkpatch uses the fact that filename
equals '-', such as in setting '$vname' to 'Your patch' for stdin.  The
command below would report "/dev/stdin has style problems" instead of
"Your patch has style problems."

  git show --pretty=email | scripts/checkpatch.pl /dev/stdin

Link: http://lkml.kernel.org/r/48be31e414bddc65bccfa6b1322359be9ba032eb.1469670589.git.allenbh@gmail.com
Signed-off-by: Allen Hubbe <allenbh@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 scripts/checkpatch.pl | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 1d5b09dd577a..6f2ce0cafe6f 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2069,6 +2069,7 @@ sub process {
 	my $is_patch = 0;
 	my $in_header_lines = $file ? 0 : 1;
 	my $in_commit_log = 0;		#Scanning lines before patch
+	my $has_commit_log = 0;		#Encountered lines before patch
        my $commit_log_possible_stack_dump = 0;
 	my $commit_log_long_line = 0;
 	my $commit_log_has_diff = 0;
@@ -2566,6 +2567,7 @@ sub process {
 		      $rawline =~ /^(commit\b|from\b|[\w-]+:).*$/i)) {
 			$in_header_lines = 0;
 			$in_commit_log = 1;
+			$has_commit_log = 1;
 		}
 
 # Check if there is UTF-8 in a commit log when a mail header has explicitly
@@ -6055,7 +6057,7 @@ sub process {
 		ERROR("NOT_UNIFIED_DIFF",
 		      "Does not appear to be a unified-diff format patch\n");
 	}
-	if ($is_patch && $filename ne '-' && $chk_signoff && $signoff == 0) {
+	if ($is_patch && $has_commit_log && $chk_signoff && $signoff == 0) {
 		ERROR("MISSING_SIGN_OFF",
 		      "Missing Signed-off-by: line(s)\n");
 	}

From 45107ff6d5265b9786c62b694140d839bc3d2433 Mon Sep 17 00:00:00 2001
From: Allen Hubbe <allenbh@gmail.com>
Date: Tue, 2 Aug 2016 14:04:47 -0700
Subject: [PATCH 056/111] checkpatch: if no filenames then read stdin

If no filenames are given, then read the patch from stdin.

Link: http://lkml.kernel.org/r/a8784f291ccb5067361992bf5d41ff6cfb0ce5cb.1469830917.git.allenbh@gmail.com
Signed-off-by: Allen Hubbe <allenbh@gmail.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 scripts/checkpatch.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 6f2ce0cafe6f..4de3cc42fc50 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -228,9 +228,9 @@ if ($^V && $^V lt $minimum_perl_version) {
 	}
 }
 
+#if no filenames are given, push '-' to read patch from stdin
 if ($#ARGV < 0) {
-	print "$P: no input files\n";
-	exit(1);
+	push(@ARGV, '-');
 }
 
 sub hash_save_array_words {

From 0036d1f7eb95bcc52977f15507f00dd07018e7e2 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Tue, 2 Aug 2016 14:04:51 -0700
Subject: [PATCH 057/111] binfmt_elf: fix calculations for bss padding

A double-bug exists in the bss calculation code, where an overflow can
happen in the "last_bss - elf_bss" calculation, but vm_brk internally
aligns the argument, underflowing it, wrapping back around safe.  We
shouldn't depend on these bugs staying in sync, so this cleans up the
bss padding handling to avoid the overflow.

This moves the bss padzero() before the last_bss > elf_bss case, since
the zero-filling of the ELF_PAGE should have nothing to do with the
relationship of last_bss and elf_bss: any trailing portion should be
zeroed, and a zero size is already handled by padzero().

Then it handles the math on elf_bss vs last_bss correctly.  These need
to both be ELF_PAGE aligned to get the comparison correct, since that's
the expected granularity of the mappings.  Since elf_bss already had
alignment-based padding happen in padzero(), the "start" of the new
vm_brk() should be moved forward as done in the original code.  However,
since the "end" of the vm_brk() area will already become PAGE_ALIGNed in
vm_brk() then last_bss should get aligned here to avoid hiding it as a
side-effect.

Additionally makes a cosmetic change to the initial last_bss calculation
so it's easier to read in comparison to the load_addr calculation above
it (i.e.  the only difference is p_filesz vs p_memsz).

Link: http://lkml.kernel.org/r/1468014494-25291-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Ismael Ripoll Ripoll <iripoll@upv.es>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/binfmt_elf.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index a7a28110dc80..7f6aff3f72eb 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -605,28 +605,30 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 			 * Do the same thing for the memory mapping - between
 			 * elf_bss and last_bss is the bss section.
 			 */
-			k = load_addr + eppnt->p_memsz + eppnt->p_vaddr;
+			k = load_addr + eppnt->p_vaddr + eppnt->p_memsz;
 			if (k > last_bss)
 				last_bss = k;
 		}
 	}
 
+	/*
+	 * Now fill out the bss section: first pad the last page from
+	 * the file up to the page boundary, and zero it from elf_bss
+	 * up to the end of the page.
+	 */
+	if (padzero(elf_bss)) {
+		error = -EFAULT;
+		goto out;
+	}
+	/*
+	 * Next, align both the file and mem bss up to the page size,
+	 * since this is where elf_bss was just zeroed up to, and where
+	 * last_bss will end after the vm_brk() below.
+	 */
+	elf_bss = ELF_PAGEALIGN(elf_bss);
+	last_bss = ELF_PAGEALIGN(last_bss);
+	/* Finally, if there is still more bss to allocate, do it. */
 	if (last_bss > elf_bss) {
-		/*
-		 * Now fill out the bss section.  First pad the last page up
-		 * to the page boundary, and then perform a mmap to make sure
-		 * that there are zero-mapped pages up to and including the
-		 * last bss page.
-		 */
-		if (padzero(elf_bss)) {
-			error = -EFAULT;
-			goto out;
-		}
-
-		/* What we have mapped so far */
-		elf_bss = ELF_PAGESTART(elf_bss + ELF_MIN_ALIGN - 1);
-
-		/* Map the last of the bss segment */
 		error = vm_brk(elf_bss, last_bss - elf_bss);
 		if (error)
 			goto out;

From ba093a6d9397da8eafcfbaa7d95bd34255da39a0 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Tue, 2 Aug 2016 14:04:54 -0700
Subject: [PATCH 058/111] mm: refuse wrapped vm_brk requests

The vm_brk() alignment calculations should refuse to overflow.  The ELF
loader depending on this, but it has been fixed now.  No other unsafe
callers have been found.

Link: http://lkml.kernel.org/r/1468014494-25291-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Ismael Ripoll Ripoll <iripoll@upv.es>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 mm/mmap.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index d44bee96a5fe..ca9d91bca0d6 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2653,16 +2653,18 @@ static inline void verify_mm_writelocked(struct mm_struct *mm)
  *  anonymous maps.  eventually we may be able to do some
  *  brk-specific accounting here.
  */
-static int do_brk(unsigned long addr, unsigned long len)
+static int do_brk(unsigned long addr, unsigned long request)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma, *prev;
-	unsigned long flags;
+	unsigned long flags, len;
 	struct rb_node **rb_link, *rb_parent;
 	pgoff_t pgoff = addr >> PAGE_SHIFT;
 	int error;
 
-	len = PAGE_ALIGN(len);
+	len = PAGE_ALIGN(request);
+	if (len < request)
+		return -ENOMEM;
 	if (!len)
 		return 0;
 

From a310dcb7a43f971ee7d810fbbe36bd766a299717 Mon Sep 17 00:00:00 2001
From: Daniel Wagner <daniel.wagner@bmw-carit.de>
Date: Tue, 2 Aug 2016 14:04:57 -0700
Subject: [PATCH 059/111] fs/binfmt_em86.c: fix incompatible pointer type

Since the -Wincompatible-pointer-types is reported as error, alpha
doesn't build anymore.  Let's fix it in a minimal way.

  fs/binfmt_em86.c:73:35: error: passing argument 2 of `copy_strings_kernel' from incompatible pointer type [-Werror=incompatible-pointer-types]
     retval = copy_strings_kernel(1, &i_arg, bprm);
                                     ^            ^
  fs/binfmt_em86.c:77:34: error: passing argument 2 of `copy_strings_kernel' from incompatible pointer type [-Werror=incompatible-pointer-types]
    retval = copy_strings_kernel(1, &i_name, bprm);
                                    ^

Link: http://lkml.kernel.org/r/1469525978-23359-1-git-send-email-wagi@monom.org
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/binfmt_em86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index 490538536cb4..dd2d3f0cd55d 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -24,7 +24,8 @@
 
 static int load_em86(struct linux_binprm *bprm)
 {
-	char *interp, *i_name, *i_arg;
+	const char *i_name, *i_arg;
+	char *interp;
 	struct file * file;
 	int retval;
 	struct elfhdr	elf_ex;

From cae3d4ca6fd6872d8e9c21eff0e56398c938100a Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:00 -0700
Subject: [PATCH 060/111] nilfs2: hide function name argument from
 nilfs_error()

Simplify nilfs_error(), an output function used to report critical
issues in file system.  This renames the original nilfs_error() function
to __nilfs_error() and redefines it as a macro to hide its function name
argument within the macro.

Every call site of nilfs_error() is changed to strip __func__ argument
except nilfs_bmap_convert_error(); nilfs_bmap_convert_error() directly
calls __nilfs_error() because it inherits caller's function name.

Link: http://lkml.kernel.org/r/1464875891-5443-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/bmap.c  |  4 ++--
 fs/nilfs2/dir.c   | 38 +++++++++++++++++---------------------
 fs/nilfs2/ifile.c |  3 +--
 fs/nilfs2/nilfs.h | 20 +++++++++++++++++++-
 fs/nilfs2/super.c | 22 ++++++++++++----------
 5 files changed, 51 insertions(+), 36 deletions(-)

diff --git a/fs/nilfs2/bmap.c b/fs/nilfs2/bmap.c
index f2a7877e0c8c..01fb1831ca25 100644
--- a/fs/nilfs2/bmap.c
+++ b/fs/nilfs2/bmap.c
@@ -41,8 +41,8 @@ static int nilfs_bmap_convert_error(struct nilfs_bmap *bmap,
 	struct inode *inode = bmap->b_inode;
 
 	if (err == -EINVAL) {
-		nilfs_error(inode->i_sb, fname,
-			    "broken bmap (inode number=%lu)", inode->i_ino);
+		__nilfs_error(inode->i_sb, fname,
+			      "broken bmap (inode number=%lu)", inode->i_ino);
 		err = -EIO;
 	}
 	return err;
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index e506f4f7120a..746956d2937a 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -140,10 +140,9 @@ out:
 	/* Too bad, we had an error */
 
 Ebadsize:
-	nilfs_error(sb, "nilfs_check_page",
+	nilfs_error(sb,
 		    "size of directory #%lu is not a multiple of chunk size",
-		    dir->i_ino
-	);
+		    dir->i_ino);
 	goto fail;
 Eshort:
 	error = "rec_len is smaller than minimal";
@@ -157,19 +156,18 @@ Enamelen:
 Espan:
 	error = "directory entry across blocks";
 bad_entry:
-	nilfs_error(sb, "nilfs_check_page", "bad entry in directory #%lu: %s - "
-		    "offset=%lu, inode=%lu, rec_len=%d, name_len=%d",
-		    dir->i_ino, error, (page->index<<PAGE_SHIFT)+offs,
-		    (unsigned long) le64_to_cpu(p->inode),
+	nilfs_error(sb,
+		    "bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%d, name_len=%d",
+		    dir->i_ino, error, (page->index << PAGE_SHIFT) + offs,
+		    (unsigned long)le64_to_cpu(p->inode),
 		    rec_len, p->name_len);
 	goto fail;
 Eend:
 	p = (struct nilfs_dir_entry *)(kaddr + offs);
-	nilfs_error(sb, "nilfs_check_page",
-		    "entry in directory #%lu spans the page boundary"
-		    "offset=%lu, inode=%lu",
-		    dir->i_ino, (page->index<<PAGE_SHIFT)+offs,
-		    (unsigned long) le64_to_cpu(p->inode));
+	nilfs_error(sb,
+		    "entry in directory #%lu spans the page boundary offset=%lu, inode=%lu",
+		    dir->i_ino, (page->index << PAGE_SHIFT) + offs,
+		    (unsigned long)le64_to_cpu(p->inode));
 fail:
 	SetPageError(page);
 	return false;
@@ -267,8 +265,7 @@ static int nilfs_readdir(struct file *file, struct dir_context *ctx)
 		struct page *page = nilfs_get_page(inode, n);
 
 		if (IS_ERR(page)) {
-			nilfs_error(sb, __func__, "bad page in #%lu",
-				    inode->i_ino);
+			nilfs_error(sb, "bad page in #%lu", inode->i_ino);
 			ctx->pos += PAGE_SIZE - offset;
 			return -EIO;
 		}
@@ -278,8 +275,7 @@ static int nilfs_readdir(struct file *file, struct dir_context *ctx)
 			NILFS_DIR_REC_LEN(1);
 		for ( ; (char *)de <= limit; de = nilfs_next_entry(de)) {
 			if (de->rec_len == 0) {
-				nilfs_error(sb, __func__,
-					    "zero-length directory entry");
+				nilfs_error(sb, "zero-length directory entry");
 				nilfs_put_page(page);
 				return -EIO;
 			}
@@ -345,7 +341,7 @@ nilfs_find_entry(struct inode *dir, const struct qstr *qstr,
 			kaddr += nilfs_last_byte(dir, n) - reclen;
 			while ((char *) de <= kaddr) {
 				if (de->rec_len == 0) {
-					nilfs_error(dir->i_sb, __func__,
+					nilfs_error(dir->i_sb,
 						"zero-length directory entry");
 					nilfs_put_page(page);
 					goto out;
@@ -360,7 +356,7 @@ nilfs_find_entry(struct inode *dir, const struct qstr *qstr,
 			n = 0;
 		/* next page is past the blocks we've got */
 		if (unlikely(n > (dir->i_blocks >> (PAGE_SHIFT - 9)))) {
-			nilfs_error(dir->i_sb, __func__,
+			nilfs_error(dir->i_sb,
 			       "dir %lu size %lld exceeds block count %llu",
 			       dir->i_ino, dir->i_size,
 			       (unsigned long long)dir->i_blocks);
@@ -469,7 +465,7 @@ int nilfs_add_link(struct dentry *dentry, struct inode *inode)
 				goto got_it;
 			}
 			if (de->rec_len == 0) {
-				nilfs_error(dir->i_sb, __func__,
+				nilfs_error(dir->i_sb,
 					    "zero-length directory entry");
 				err = -EIO;
 				goto out_unlock;
@@ -541,7 +537,7 @@ int nilfs_delete_entry(struct nilfs_dir_entry *dir, struct page *page)
 
 	while ((char *)de < (char *)dir) {
 		if (de->rec_len == 0) {
-			nilfs_error(inode->i_sb, __func__,
+			nilfs_error(inode->i_sb,
 				    "zero-length directory entry");
 			err = -EIO;
 			goto out;
@@ -628,7 +624,7 @@ int nilfs_empty_dir(struct inode *inode)
 
 		while ((char *)de <= kaddr) {
 			if (de->rec_len == 0) {
-				nilfs_error(inode->i_sb, __func__,
+				nilfs_error(inode->i_sb,
 					    "zero-length directory entry (kaddr=%p, de=%p)",
 					    kaddr, de);
 				goto not_empty;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 1d2b1805327a..b1c96285aa4a 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -145,8 +145,7 @@ int nilfs_ifile_get_inode_block(struct inode *ifile, ino_t ino,
 	int err;
 
 	if (unlikely(!NILFS_VALID_INODE(sb, ino))) {
-		nilfs_error(sb, __func__, "bad inode number: %lu",
-			    (unsigned long) ino);
+		nilfs_error(sb, "bad inode number: %lu", (unsigned long)ino);
 		return -EINVAL;
 	}
 
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index b1d48bc0532d..e482c78bcc86 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -299,10 +299,28 @@ static inline int nilfs_mark_inode_dirty_sync(struct inode *inode)
 /* super.c */
 extern struct inode *nilfs_alloc_inode(struct super_block *);
 extern void nilfs_destroy_inode(struct inode *);
+
 extern __printf(3, 4)
-void nilfs_error(struct super_block *, const char *, const char *, ...);
+void __nilfs_error(struct super_block *sb, const char *function,
+		   const char *fmt, ...);
 extern __printf(3, 4)
 void nilfs_warning(struct super_block *, const char *, const char *, ...);
+
+#ifdef CONFIG_PRINTK
+
+#define nilfs_error(sb, fmt, ...)					\
+	__nilfs_error(sb, __func__, fmt, ##__VA_ARGS__)
+
+#else
+
+#define nilfs_error(sb, fmt, ...)					\
+	do {								\
+		no_printk(fmt, ##__VA_ARGS__);				\
+		__nilfs_error(sb, "", " ");				\
+	} while (0)
+
+#endif /* CONFIG_PRINTK */
+
 extern struct nilfs_super_block *
 nilfs_read_super_block(struct super_block *, u64, int, struct buffer_head **);
 extern int nilfs_store_magic_and_option(struct super_block *,
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 666107a18a22..7fe497eb2181 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -91,19 +91,21 @@ static void nilfs_set_error(struct super_block *sb)
 }
 
 /**
- * nilfs_error() - report failure condition on a filesystem
+ * __nilfs_error() - report failure condition on a filesystem
  *
- * nilfs_error() sets an ERROR_FS flag on the superblock as well as
- * reporting an error message.  It should be called when NILFS detects
- * incoherences or defects of meta data on disk.  As for sustainable
- * errors such as a single-shot I/O error, nilfs_warning() or the printk()
- * function should be used instead.
+ * __nilfs_error() sets an ERROR_FS flag on the superblock as well as
+ * reporting an error message.  This function should be called when
+ * NILFS detects incoherences or defects of meta data on disk.
  *
- * The segment constructor must not call this function because it can
- * kill itself.
+ * This implements the body of nilfs_error() macro.  Normally,
+ * nilfs_error() should be used.  As for sustainable errors such as a
+ * single-shot I/O error, nilfs_warning() or printk() should be used
+ * instead.
+ *
+ * Callers should not add a trailing newline since this will do it.
  */
-void nilfs_error(struct super_block *sb, const char *function,
-		 const char *fmt, ...)
+void __nilfs_error(struct super_block *sb, const char *function,
+		   const char *fmt, ...)
 {
 	struct the_nilfs *nilfs = sb->s_fs_info;
 	struct va_format vaf;

From a66dfb0a91c211c77b5d4e503d3e760e2e566189 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:02 -0700
Subject: [PATCH 061/111] nilfs2: add nilfs_msg() message interface

Define an own output routine to replace bare use of printk() function.
The output routine is implemented with a macro and a helper function,
which are named nilfs_msg() and __nilfs_msg(), respectively.

__nilfs_msg() formats a message like "NILFS (<device-name>): <message>",
prefixing it with a given log level, and terminates the statement with a
newline.  The "device-name" is optional to make it available in early
stages; it will be omitted if a NULL pointer is passed to super block
instance argument.  nilfs_msg() wraps __nilfs_msg() and is removed if
CONFIG_PRINTK is not set.

Link: http://lkml.kernel.org/r/1464875891-5443-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/nilfs.h |  7 +++++++
 fs/nilfs2/super.c | 16 ++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index e482c78bcc86..b57ce41e8a1a 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -300,6 +300,9 @@ static inline int nilfs_mark_inode_dirty_sync(struct inode *inode)
 extern struct inode *nilfs_alloc_inode(struct super_block *);
 extern void nilfs_destroy_inode(struct inode *);
 
+extern __printf(3, 4)
+void __nilfs_msg(struct super_block *sb, const char *level,
+		 const char *fmt, ...);
 extern __printf(3, 4)
 void __nilfs_error(struct super_block *sb, const char *function,
 		   const char *fmt, ...);
@@ -308,11 +311,15 @@ void nilfs_warning(struct super_block *, const char *, const char *, ...);
 
 #ifdef CONFIG_PRINTK
 
+#define nilfs_msg(sb, level, fmt, ...)					\
+	__nilfs_msg(sb, level, fmt, ##__VA_ARGS__)
 #define nilfs_error(sb, fmt, ...)					\
 	__nilfs_error(sb, __func__, fmt, ##__VA_ARGS__)
 
 #else
 
+#define nilfs_msg(sb, level, fmt, ...)					\
+	no_printk(fmt, ##__VA_ARGS__)
 #define nilfs_error(sb, fmt, ...)					\
 	do {								\
 		no_printk(fmt, ##__VA_ARGS__);				\
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 7fe497eb2181..86e3c00994e2 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -71,6 +71,22 @@ struct kmem_cache *nilfs_btree_path_cache;
 static int nilfs_setup_super(struct super_block *sb, int is_mount);
 static int nilfs_remount(struct super_block *sb, int *flags, char *data);
 
+void __nilfs_msg(struct super_block *sb, const char *level, const char *fmt,
+		 ...)
+{
+	struct va_format vaf;
+	va_list args;
+
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
+	if (sb)
+		printk("%sNILFS (%s): %pV\n", level, sb->s_id, &vaf);
+	else
+		printk("%sNILFS: %pV\n", level, &vaf);
+	va_end(args);
+}
+
 static void nilfs_set_error(struct super_block *sb)
 {
 	struct the_nilfs *nilfs = sb->s_fs_info;

From 6625689e159fa1d43572ee113713ab23bec03131 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:06 -0700
Subject: [PATCH 062/111] nilfs2: embed a back pointer to super block instance
 in nilfs object

Insert a back pointer to super block instance in nilfs object so that
functions of nilfs2 easily refer to the super block instance.  This
simplifies replacement of printk() in the successive change.

Link: http://lkml.kernel.org/r/1464875891-5443-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/super.c     | 2 +-
 fs/nilfs2/the_nilfs.c | 7 ++++---
 fs/nilfs2/the_nilfs.h | 4 +++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 86e3c00994e2..2d4d0bec711e 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1076,7 +1076,7 @@ nilfs_fill_super(struct super_block *sb, void *data, int silent)
 	__u64 cno;
 	int err;
 
-	nilfs = alloc_nilfs(sb->s_bdev);
+	nilfs = alloc_nilfs(sb);
 	if (!nilfs)
 		return -ENOMEM;
 
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index e9fd241b9a0a..702115164cf3 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -56,12 +56,12 @@ void nilfs_set_last_segment(struct the_nilfs *nilfs,
 
 /**
  * alloc_nilfs - allocate a nilfs object
- * @bdev: block device to which the_nilfs is related
+ * @sb: super block instance
  *
  * Return Value: On success, pointer to the_nilfs is returned.
  * On error, NULL is returned.
  */
-struct the_nilfs *alloc_nilfs(struct block_device *bdev)
+struct the_nilfs *alloc_nilfs(struct super_block *sb)
 {
 	struct the_nilfs *nilfs;
 
@@ -69,7 +69,8 @@ struct the_nilfs *alloc_nilfs(struct block_device *bdev)
 	if (!nilfs)
 		return NULL;
 
-	nilfs->ns_bdev = bdev;
+	nilfs->ns_sb = sb;
+	nilfs->ns_bdev = sb->s_bdev;
 	atomic_set(&nilfs->ns_ndirtyblks, 0);
 	init_rwsem(&nilfs->ns_sem);
 	mutex_init(&nilfs->ns_snapshot_mount_mutex);
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 79369fd6b13b..79d1421896d0 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -43,6 +43,7 @@ enum {
  * struct the_nilfs - struct to supervise multiple nilfs mount points
  * @ns_flags: flags
  * @ns_flushed_device: flag indicating if all volatile data was flushed
+ * @ns_sb: back pointer to super block instance
  * @ns_bdev: block device
  * @ns_sem: semaphore for shared states
  * @ns_snapshot_mount_mutex: mutex to protect snapshot mounts
@@ -102,6 +103,7 @@ struct the_nilfs {
 	unsigned long		ns_flags;
 	int			ns_flushed_device;
 
+	struct super_block     *ns_sb;
 	struct block_device    *ns_bdev;
 	struct rw_semaphore	ns_sem;
 	struct mutex		ns_snapshot_mount_mutex;
@@ -281,7 +283,7 @@ static inline int nilfs_sb_will_flip(struct the_nilfs *nilfs)
 }
 
 void nilfs_set_last_segment(struct the_nilfs *, sector_t, u64, __u64);
-struct the_nilfs *alloc_nilfs(struct block_device *bdev);
+struct the_nilfs *alloc_nilfs(struct super_block *sb);
 void destroy_nilfs(struct the_nilfs *nilfs);
 int init_nilfs(struct the_nilfs *nilfs, struct super_block *sb, char *data);
 int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb);

From feee880fa58254fcc1c78bc8b6446a435cc1baf0 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:10 -0700
Subject: [PATCH 063/111] nilfs2: reduce bare use of printk() with nilfs_msg()

Replace most use of printk() in nilfs2 implementation with nilfs_msg(),
and reduce the following checkpatch.pl warning:

  "WARNING: Prefer [subsystem eg: netdev]_crit([subsystem]dev, ...
   then dev_crit(dev, ... then pr_crit(...  to printk(KERN_CRIT ..."

This patch also fixes a minor checkpatch warning "WARNING: quoted string
split across lines" that often accompanies the prior warning, and amends
message format as needed.

Link: http://lkml.kernel.org/r/1464875891-5443-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/btree.c     |  58 ++++++++++---------
 fs/nilfs2/cpfile.c    |  22 ++++----
 fs/nilfs2/dat.c       |  19 +++----
 fs/nilfs2/direct.c    |  10 ++--
 fs/nilfs2/inode.c     |  11 ++--
 fs/nilfs2/ioctl.c     |  47 ++++++++--------
 fs/nilfs2/recovery.c  |  72 ++++++++++++------------
 fs/nilfs2/segbuf.c    |   6 +-
 fs/nilfs2/segment.c   |  25 ++++-----
 fs/nilfs2/sufile.c    |  31 ++++++-----
 fs/nilfs2/super.c     | 100 ++++++++++++++++-----------------
 fs/nilfs2/sysfs.c     |  30 +++++-----
 fs/nilfs2/the_nilfs.c | 127 ++++++++++++++++++++++--------------------
 13 files changed, 283 insertions(+), 275 deletions(-)

diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index 982d1e3df3a5..2c52693a69a4 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -339,12 +339,14 @@ static int nilfs_btree_node_lookup(const struct nilfs_btree_node *node,
  * nilfs_btree_node_broken - verify consistency of btree node
  * @node: btree node block to be examined
  * @size: node size (in bytes)
+ * @inode: host inode of btree
  * @blocknr: block number
  *
  * Return Value: If node is broken, 1 is returned. Otherwise, 0 is returned.
  */
 static int nilfs_btree_node_broken(const struct nilfs_btree_node *node,
-				   size_t size, sector_t blocknr)
+				   size_t size, struct inode *inode,
+				   sector_t blocknr)
 {
 	int level, flags, nchildren;
 	int ret = 0;
@@ -358,9 +360,10 @@ static int nilfs_btree_node_broken(const struct nilfs_btree_node *node,
 		     (flags & NILFS_BTREE_NODE_ROOT) ||
 		     nchildren < 0 ||
 		     nchildren > NILFS_BTREE_NODE_NCHILDREN_MAX(size))) {
-		printk(KERN_CRIT "NILFS: bad btree node (blocknr=%llu): "
-		       "level = %d, flags = 0x%x, nchildren = %d\n",
-		       (unsigned long long)blocknr, level, flags, nchildren);
+		nilfs_msg(inode->i_sb, KERN_CRIT,
+			  "bad btree node (ino=%lu, blocknr=%llu): level = %d, flags = 0x%x, nchildren = %d",
+			  inode->i_ino, (unsigned long long)blocknr, level,
+			  flags, nchildren);
 		ret = 1;
 	}
 	return ret;
@@ -369,12 +372,12 @@ static int nilfs_btree_node_broken(const struct nilfs_btree_node *node,
 /**
  * nilfs_btree_root_broken - verify consistency of btree root node
  * @node: btree root node to be examined
- * @ino: inode number
+ * @inode: host inode of btree
  *
  * Return Value: If node is broken, 1 is returned. Otherwise, 0 is returned.
  */
 static int nilfs_btree_root_broken(const struct nilfs_btree_node *node,
-				   unsigned long ino)
+				   struct inode *inode)
 {
 	int level, flags, nchildren;
 	int ret = 0;
@@ -387,8 +390,9 @@ static int nilfs_btree_root_broken(const struct nilfs_btree_node *node,
 		     level >= NILFS_BTREE_LEVEL_MAX ||
 		     nchildren < 0 ||
 		     nchildren > NILFS_BTREE_ROOT_NCHILDREN_MAX)) {
-		pr_crit("NILFS: bad btree root (inode number=%lu): level = %d, flags = 0x%x, nchildren = %d\n",
-			ino, level, flags, nchildren);
+		nilfs_msg(inode->i_sb, KERN_CRIT,
+			  "bad btree root (ino=%lu): level = %d, flags = 0x%x, nchildren = %d",
+			  inode->i_ino, level, flags, nchildren);
 		ret = 1;
 	}
 	return ret;
@@ -396,13 +400,15 @@ static int nilfs_btree_root_broken(const struct nilfs_btree_node *node,
 
 int nilfs_btree_broken_node_block(struct buffer_head *bh)
 {
+	struct inode *inode;
 	int ret;
 
 	if (buffer_nilfs_checked(bh))
 		return 0;
 
+	inode = bh->b_page->mapping->host;
 	ret = nilfs_btree_node_broken((struct nilfs_btree_node *)bh->b_data,
-				       bh->b_size, bh->b_blocknr);
+				      bh->b_size, inode, bh->b_blocknr);
 	if (likely(!ret))
 		set_buffer_nilfs_checked(bh);
 	return ret;
@@ -448,13 +454,15 @@ nilfs_btree_get_node(const struct nilfs_bmap *btree,
 	return node;
 }
 
-static int
-nilfs_btree_bad_node(struct nilfs_btree_node *node, int level)
+static int nilfs_btree_bad_node(const struct nilfs_bmap *btree,
+				struct nilfs_btree_node *node, int level)
 {
 	if (unlikely(nilfs_btree_node_get_level(node) != level)) {
 		dump_stack();
-		printk(KERN_CRIT "NILFS: btree level mismatch: %d != %d\n",
-		       nilfs_btree_node_get_level(node), level);
+		nilfs_msg(btree->b_inode->i_sb, KERN_CRIT,
+			  "btree level mismatch (ino=%lu): %d != %d",
+			  btree->b_inode->i_ino,
+			  nilfs_btree_node_get_level(node), level);
 		return 1;
 	}
 	return 0;
@@ -568,7 +576,7 @@ static int nilfs_btree_do_lookup(const struct nilfs_bmap *btree,
 			return ret;
 
 		node = nilfs_btree_get_nonroot_node(path, level);
-		if (nilfs_btree_bad_node(node, level))
+		if (nilfs_btree_bad_node(btree, node, level))
 			return -EINVAL;
 		if (!found)
 			found = nilfs_btree_node_lookup(node, key, &index);
@@ -616,7 +624,7 @@ static int nilfs_btree_do_lookup_last(const struct nilfs_bmap *btree,
 		if (ret < 0)
 			return ret;
 		node = nilfs_btree_get_nonroot_node(path, level);
-		if (nilfs_btree_bad_node(node, level))
+		if (nilfs_btree_bad_node(btree, node, level))
 			return -EINVAL;
 		index = nilfs_btree_node_get_nchildren(node) - 1;
 		ptr = nilfs_btree_node_get_ptr(node, index, ncmax);
@@ -2072,8 +2080,10 @@ static int nilfs_btree_propagate(struct nilfs_bmap *btree,
 	ret = nilfs_btree_do_lookup(btree, path, key, NULL, level + 1, 0);
 	if (ret < 0) {
 		if (unlikely(ret == -ENOENT))
-			printk(KERN_CRIT "%s: key = %llu, level == %d\n",
-			       __func__, (unsigned long long)key, level);
+			nilfs_msg(btree->b_inode->i_sb, KERN_CRIT,
+				  "writing node/leaf block does not appear in b-tree (ino=%lu) at key=%llu, level=%d",
+				  btree->b_inode->i_ino,
+				  (unsigned long long)key, level);
 		goto out;
 	}
 
@@ -2110,12 +2120,11 @@ static void nilfs_btree_add_dirty_buffer(struct nilfs_bmap *btree,
 	if (level < NILFS_BTREE_LEVEL_NODE_MIN ||
 	    level >= NILFS_BTREE_LEVEL_MAX) {
 		dump_stack();
-		printk(KERN_WARNING
-		       "%s: invalid btree level: %d (key=%llu, ino=%lu, "
-		       "blocknr=%llu)\n",
-		       __func__, level, (unsigned long long)key,
-		       NILFS_BMAP_I(btree)->vfs_inode.i_ino,
-		       (unsigned long long)bh->b_blocknr);
+		nilfs_msg(btree->b_inode->i_sb, KERN_WARNING,
+			  "invalid btree level: %d (key=%llu, ino=%lu, blocknr=%llu)",
+			  level, (unsigned long long)key,
+			  btree->b_inode->i_ino,
+			  (unsigned long long)bh->b_blocknr);
 		return;
 	}
 
@@ -2394,8 +2403,7 @@ int nilfs_btree_init(struct nilfs_bmap *bmap)
 
 	__nilfs_btree_init(bmap);
 
-	if (nilfs_btree_root_broken(nilfs_btree_get_root(bmap),
-				    bmap->b_inode->i_ino))
+	if (nilfs_btree_root_broken(nilfs_btree_get_root(bmap), bmap->b_inode))
 		ret = -EIO;
 	return ret;
 }
diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c
index 8a3d3b65af3f..19d9f4ae8347 100644
--- a/fs/nilfs2/cpfile.c
+++ b/fs/nilfs2/cpfile.c
@@ -332,9 +332,9 @@ int nilfs_cpfile_delete_checkpoints(struct inode *cpfile,
 	int ret, ncps, nicps, nss, count, i;
 
 	if (unlikely(start == 0 || start > end)) {
-		printk(KERN_ERR "%s: invalid range of checkpoint numbers: "
-		       "[%llu, %llu)\n", __func__,
-		       (unsigned long long)start, (unsigned long long)end);
+		nilfs_msg(cpfile->i_sb, KERN_ERR,
+			  "cannot delete checkpoints: invalid range [%llu, %llu)",
+			  (unsigned long long)start, (unsigned long long)end);
 		return -EINVAL;
 	}
 
@@ -386,9 +386,9 @@ int nilfs_cpfile_delete_checkpoints(struct inode *cpfile,
 								   cpfile, cno);
 					if (ret == 0)
 						continue;
-					printk(KERN_ERR
-					       "%s: cannot delete block\n",
-					       __func__);
+					nilfs_msg(cpfile->i_sb, KERN_ERR,
+						  "error %d deleting checkpoint block",
+						  ret);
 					break;
 				}
 			}
@@ -991,14 +991,12 @@ int nilfs_cpfile_read(struct super_block *sb, size_t cpsize,
 	int err;
 
 	if (cpsize > sb->s_blocksize) {
-		printk(KERN_ERR
-		       "NILFS: too large checkpoint size: %zu bytes.\n",
-		       cpsize);
+		nilfs_msg(sb, KERN_ERR,
+			  "too large checkpoint size: %zu bytes", cpsize);
 		return -EINVAL;
 	} else if (cpsize < NILFS_MIN_CHECKPOINT_SIZE) {
-		printk(KERN_ERR
-		       "NILFS: too small checkpoint size: %zu bytes.\n",
-		       cpsize);
+		nilfs_msg(sb, KERN_ERR,
+			  "too small checkpoint size: %zu bytes", cpsize);
 		return -EINVAL;
 	}
 
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 7367610ea807..dffedb2f8817 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -349,10 +349,11 @@ int nilfs_dat_move(struct inode *dat, __u64 vblocknr, sector_t blocknr)
 	kaddr = kmap_atomic(entry_bh->b_page);
 	entry = nilfs_palloc_block_get_entry(dat, vblocknr, entry_bh, kaddr);
 	if (unlikely(entry->de_blocknr == cpu_to_le64(0))) {
-		printk(KERN_CRIT "%s: vbn = %llu, [%llu, %llu)\n", __func__,
-		       (unsigned long long)vblocknr,
-		       (unsigned long long)le64_to_cpu(entry->de_start),
-		       (unsigned long long)le64_to_cpu(entry->de_end));
+		nilfs_msg(dat->i_sb, KERN_CRIT,
+			  "%s: invalid vblocknr = %llu, [%llu, %llu)",
+			  __func__, (unsigned long long)vblocknr,
+			  (unsigned long long)le64_to_cpu(entry->de_start),
+			  (unsigned long long)le64_to_cpu(entry->de_end));
 		kunmap_atomic(kaddr);
 		brelse(entry_bh);
 		return -EINVAL;
@@ -479,14 +480,12 @@ int nilfs_dat_read(struct super_block *sb, size_t entry_size,
 	int err;
 
 	if (entry_size > sb->s_blocksize) {
-		printk(KERN_ERR
-		       "NILFS: too large DAT entry size: %zu bytes.\n",
-		       entry_size);
+		nilfs_msg(sb, KERN_ERR, "too large DAT entry size: %zu bytes",
+			  entry_size);
 		return -EINVAL;
 	} else if (entry_size < NILFS_MIN_DAT_ENTRY_SIZE) {
-		printk(KERN_ERR
-		       "NILFS: too small DAT entry size: %zu bytes.\n",
-		       entry_size);
+		nilfs_msg(sb, KERN_ERR, "too small DAT entry size: %zu bytes",
+			  entry_size);
 		return -EINVAL;
 	}
 
diff --git a/fs/nilfs2/direct.c b/fs/nilfs2/direct.c
index 251a44928405..96e3ed0d9652 100644
--- a/fs/nilfs2/direct.c
+++ b/fs/nilfs2/direct.c
@@ -337,14 +337,16 @@ static int nilfs_direct_assign(struct nilfs_bmap *bmap,
 
 	key = nilfs_bmap_data_get_key(bmap, *bh);
 	if (unlikely(key > NILFS_DIRECT_KEY_MAX)) {
-		printk(KERN_CRIT "%s: invalid key: %llu\n", __func__,
-		       (unsigned long long)key);
+		nilfs_msg(bmap->b_inode->i_sb, KERN_CRIT,
+			  "%s (ino=%lu): invalid key: %llu", __func__,
+			  bmap->b_inode->i_ino, (unsigned long long)key);
 		return -EINVAL;
 	}
 	ptr = nilfs_direct_get_ptr(bmap, key);
 	if (unlikely(ptr == NILFS_BMAP_INVALID_PTR)) {
-		printk(KERN_CRIT "%s: invalid pointer: %llu\n", __func__,
-		       (unsigned long long)ptr);
+		nilfs_msg(bmap->b_inode->i_sb, KERN_CRIT,
+			  "%s (ino=%lu): invalid pointer: %llu", __func__,
+			  bmap->b_inode->i_ino, (unsigned long long)ptr);
 		return -EINVAL;
 	}
 
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index a0ebdb17e912..a965fcf77955 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -112,13 +112,10 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
 				 * However, the page having this block must
 				 * be locked in this case.
 				 */
-				printk(KERN_WARNING
-				       "nilfs_get_block: a race condition "
-				       "while inserting a data block. "
-				       "(inode number=%lu, file block "
-				       "offset=%llu)\n",
-				       inode->i_ino,
-				       (unsigned long long)blkoff);
+				nilfs_msg(inode->i_sb, KERN_WARNING,
+					  "%s (ino=%lu): a race condition while inserting a data block at offset=%llu",
+					  __func__, inode->i_ino,
+					  (unsigned long long)blkoff);
 				err = 0;
 			}
 			nilfs_transaction_abort(inode->i_sb);
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 358b57e2cdf9..827283fe9525 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -584,27 +584,25 @@ static int nilfs_ioctl_move_inode_block(struct inode *inode,
 
 	if (unlikely(ret < 0)) {
 		if (ret == -ENOENT)
-			printk(KERN_CRIT
-			       "%s: invalid virtual block address (%s): "
-			       "ino=%llu, cno=%llu, offset=%llu, "
-			       "blocknr=%llu, vblocknr=%llu\n",
-			       __func__, vdesc->vd_flags ? "node" : "data",
-			       (unsigned long long)vdesc->vd_ino,
-			       (unsigned long long)vdesc->vd_cno,
-			       (unsigned long long)vdesc->vd_offset,
-			       (unsigned long long)vdesc->vd_blocknr,
-			       (unsigned long long)vdesc->vd_vblocknr);
+			nilfs_msg(inode->i_sb, KERN_CRIT,
+				  "%s: invalid virtual block address (%s): ino=%llu, cno=%llu, offset=%llu, blocknr=%llu, vblocknr=%llu",
+				  __func__, vdesc->vd_flags ? "node" : "data",
+				  (unsigned long long)vdesc->vd_ino,
+				  (unsigned long long)vdesc->vd_cno,
+				  (unsigned long long)vdesc->vd_offset,
+				  (unsigned long long)vdesc->vd_blocknr,
+				  (unsigned long long)vdesc->vd_vblocknr);
 		return ret;
 	}
 	if (unlikely(!list_empty(&bh->b_assoc_buffers))) {
-		printk(KERN_CRIT "%s: conflicting %s buffer: ino=%llu, "
-		       "cno=%llu, offset=%llu, blocknr=%llu, vblocknr=%llu\n",
-		       __func__, vdesc->vd_flags ? "node" : "data",
-		       (unsigned long long)vdesc->vd_ino,
-		       (unsigned long long)vdesc->vd_cno,
-		       (unsigned long long)vdesc->vd_offset,
-		       (unsigned long long)vdesc->vd_blocknr,
-		       (unsigned long long)vdesc->vd_vblocknr);
+		nilfs_msg(inode->i_sb, KERN_CRIT,
+			  "%s: conflicting %s buffer: ino=%llu, cno=%llu, offset=%llu, blocknr=%llu, vblocknr=%llu",
+			  __func__, vdesc->vd_flags ? "node" : "data",
+			  (unsigned long long)vdesc->vd_ino,
+			  (unsigned long long)vdesc->vd_cno,
+			  (unsigned long long)vdesc->vd_offset,
+			  (unsigned long long)vdesc->vd_blocknr,
+			  (unsigned long long)vdesc->vd_vblocknr);
 		brelse(bh);
 		return -EEXIST;
 	}
@@ -854,8 +852,8 @@ int nilfs_ioctl_prepare_clean_segments(struct the_nilfs *nilfs,
 	return 0;
 
  failed:
-	printk(KERN_ERR "NILFS: GC failed during preparation: %s: err=%d\n",
-	       msg, ret);
+	nilfs_msg(nilfs->ns_sb, KERN_ERR, "error %d preparing GC: %s", ret,
+		  msg);
 	return ret;
 }
 
@@ -963,10 +961,11 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
 	}
 
 	ret = nilfs_ioctl_move_blocks(inode->i_sb, &argv[0], kbufs[0]);
-	if (ret < 0)
-		printk(KERN_ERR "NILFS: GC failed during preparation: "
-			"cannot read source blocks: err=%d\n", ret);
-	else {
+	if (ret < 0) {
+		nilfs_msg(inode->i_sb, KERN_ERR,
+			  "error %d preparing GC: cannot read source blocks",
+			  ret);
+	} else {
 		if (nilfs_sb_need_update(nilfs))
 			set_nilfs_discontinued(nilfs);
 		ret = nilfs_clean_segments(inode->i_sb, argv, kbufs);
diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index d893dc912b62..5139efed1888 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -54,38 +54,37 @@ struct nilfs_recovery_block {
 };
 
 
-static int nilfs_warn_segment_error(int err)
+static int nilfs_warn_segment_error(struct super_block *sb, int err)
 {
+	const char *msg = NULL;
+
 	switch (err) {
 	case NILFS_SEG_FAIL_IO:
-		printk(KERN_WARNING
-		       "NILFS warning: I/O error on loading last segment\n");
+		nilfs_msg(sb, KERN_ERR, "I/O error reading segment");
 		return -EIO;
 	case NILFS_SEG_FAIL_MAGIC:
-		printk(KERN_WARNING
-		       "NILFS warning: Segment magic number invalid\n");
+		msg = "Magic number mismatch";
 		break;
 	case NILFS_SEG_FAIL_SEQ:
-		printk(KERN_WARNING
-		       "NILFS warning: Sequence number mismatch\n");
+		msg = "Sequence number mismatch";
 		break;
 	case NILFS_SEG_FAIL_CHECKSUM_SUPER_ROOT:
-		printk(KERN_WARNING
-		       "NILFS warning: Checksum error in super root\n");
+		msg = "Checksum error in super root";
 		break;
 	case NILFS_SEG_FAIL_CHECKSUM_FULL:
-		printk(KERN_WARNING
-		       "NILFS warning: Checksum error in segment payload\n");
+		msg = "Checksum error in segment payload";
 		break;
 	case NILFS_SEG_FAIL_CONSISTENCY:
-		printk(KERN_WARNING
-		       "NILFS warning: Inconsistent segment\n");
+		msg = "Inconsistency found";
 		break;
 	case NILFS_SEG_NO_SUPER_ROOT:
-		printk(KERN_WARNING
-		       "NILFS warning: No super root in the last segment\n");
+		msg = "No super root in the last segment";
 		break;
+	default:
+		nilfs_msg(sb, KERN_ERR, "unrecognized segment error %d", err);
+		return -EINVAL;
 	}
+	nilfs_msg(sb, KERN_WARNING, "invalid segment: %s", msg);
 	return -EINVAL;
 }
 
@@ -178,7 +177,7 @@ int nilfs_read_super_root_block(struct the_nilfs *nilfs, sector_t sr_block,
 	brelse(bh_sr);
 
  failed:
-	return nilfs_warn_segment_error(ret);
+	return nilfs_warn_segment_error(nilfs->ns_sb, ret);
 }
 
 /**
@@ -553,11 +552,10 @@ static int nilfs_recover_dsync_blocks(struct the_nilfs *nilfs,
 		put_page(page);
 
  failed_inode:
-		printk(KERN_WARNING
-		       "NILFS warning: error recovering data block "
-		       "(err=%d, ino=%lu, block-offset=%llu)\n",
-		       err, (unsigned long)rb->ino,
-		       (unsigned long long)rb->blkoff);
+		nilfs_msg(sb, KERN_WARNING,
+			  "error %d recovering data block (ino=%lu, block-offset=%llu)",
+			  err, (unsigned long)rb->ino,
+			  (unsigned long long)rb->blkoff);
 		if (!err2)
 			err2 = err;
  next:
@@ -680,8 +678,8 @@ static int nilfs_do_roll_forward(struct the_nilfs *nilfs,
 	}
 
 	if (nsalvaged_blocks) {
-		printk(KERN_INFO "NILFS (device %s): salvaged %lu blocks\n",
-		       sb->s_id, nsalvaged_blocks);
+		nilfs_msg(sb, KERN_INFO, "salvaged %lu blocks",
+			  nsalvaged_blocks);
 		ri->ri_need_recovery = NILFS_RECOVERY_ROLLFORWARD_DONE;
 	}
  out:
@@ -692,10 +690,9 @@ static int nilfs_do_roll_forward(struct the_nilfs *nilfs,
  confused:
 	err = -EINVAL;
  failed:
-	printk(KERN_ERR
-	       "NILFS (device %s): Error roll-forwarding "
-	       "(err=%d, pseg block=%llu). ",
-	       sb->s_id, err, (unsigned long long)pseg_start);
+	nilfs_msg(sb, KERN_ERR,
+		  "error %d roll-forwarding partial segment at blocknr = %llu",
+		  err, (unsigned long long)pseg_start);
 	goto out;
 }
 
@@ -715,9 +712,8 @@ static void nilfs_finish_roll_forward(struct the_nilfs *nilfs,
 	set_buffer_dirty(bh);
 	err = sync_dirty_buffer(bh);
 	if (unlikely(err))
-		printk(KERN_WARNING
-		       "NILFS warning: buffer sync write failed during "
-		       "post-cleaning of recovery.\n");
+		nilfs_msg(nilfs->ns_sb, KERN_WARNING,
+			  "buffer sync write failed during post-cleaning of recovery.");
 	brelse(bh);
 }
 
@@ -752,8 +748,8 @@ int nilfs_salvage_orphan_logs(struct the_nilfs *nilfs,
 
 	err = nilfs_attach_checkpoint(sb, ri->ri_cno, true, &root);
 	if (unlikely(err)) {
-		printk(KERN_ERR
-		       "NILFS: error loading the latest checkpoint.\n");
+		nilfs_msg(sb, KERN_ERR,
+			  "error %d loading the latest checkpoint", err);
 		return err;
 	}
 
@@ -764,8 +760,9 @@ int nilfs_salvage_orphan_logs(struct the_nilfs *nilfs,
 	if (ri->ri_need_recovery == NILFS_RECOVERY_ROLLFORWARD_DONE) {
 		err = nilfs_prepare_segment_for_recovery(nilfs, sb, ri);
 		if (unlikely(err)) {
-			printk(KERN_ERR "NILFS: Error preparing segments for "
-			       "recovery.\n");
+			nilfs_msg(sb, KERN_ERR,
+				  "error %d preparing segment for recovery",
+				  err);
 			goto failed;
 		}
 
@@ -778,8 +775,9 @@ int nilfs_salvage_orphan_logs(struct the_nilfs *nilfs,
 		nilfs_detach_log_writer(sb);
 
 		if (unlikely(err)) {
-			printk(KERN_ERR "NILFS: Oops! recovery failed. "
-			       "(err=%d)\n", err);
+			nilfs_msg(sb, KERN_ERR,
+				  "error %d writing segment for recovery",
+				  err);
 			goto failed;
 		}
 
@@ -961,5 +959,5 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
  failed:
 	brelse(bh_sum);
 	nilfs_dispose_segment_list(&segments);
-	return (ret < 0) ? ret : nilfs_warn_segment_error(ret);
+	return ret < 0 ? ret : nilfs_warn_segment_error(nilfs->ns_sb, ret);
 }
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index a962d7d83447..6f87b2ac1aeb 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -514,7 +514,11 @@ static int nilfs_segbuf_wait(struct nilfs_segment_buffer *segbuf)
 	} while (--segbuf->sb_nbio > 0);
 
 	if (unlikely(atomic_read(&segbuf->sb_err) > 0)) {
-		printk(KERN_ERR "NILFS: IO error writing segment\n");
+		nilfs_msg(segbuf->sb_super, KERN_ERR,
+			  "I/O error writing log (start-blocknr=%llu, block-count=%lu) in segment %llu",
+			  (unsigned long long)segbuf->sb_pseg_start,
+			  segbuf->sb_sum.nblocks,
+			  (unsigned long long)segbuf->sb_segnum);
 		err = -EIO;
 	}
 	return err;
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index e78b68a81aec..1cc968502e53 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -150,7 +150,8 @@ static void nilfs_dispose_list(struct the_nilfs *, struct list_head *, int);
 #define nilfs_cnt32_lt(a, b)  nilfs_cnt32_gt(b, a)
 #define nilfs_cnt32_le(a, b)  nilfs_cnt32_ge(b, a)
 
-static int nilfs_prepare_segment_lock(struct nilfs_transaction_info *ti)
+static int nilfs_prepare_segment_lock(struct super_block *sb,
+				      struct nilfs_transaction_info *ti)
 {
 	struct nilfs_transaction_info *cur_ti = current->journal_info;
 	void *save = NULL;
@@ -164,8 +165,7 @@ static int nilfs_prepare_segment_lock(struct nilfs_transaction_info *ti)
 		 * it is saved and will be restored on
 		 * nilfs_transaction_commit().
 		 */
-		printk(KERN_WARNING
-		       "NILFS warning: journal info from a different FS\n");
+		nilfs_msg(sb, KERN_WARNING, "journal info from a different FS");
 		save = current->journal_info;
 	}
 	if (!ti) {
@@ -215,7 +215,7 @@ int nilfs_transaction_begin(struct super_block *sb,
 			    int vacancy_check)
 {
 	struct the_nilfs *nilfs;
-	int ret = nilfs_prepare_segment_lock(ti);
+	int ret = nilfs_prepare_segment_lock(sb, ti);
 	struct nilfs_transaction_info *trace_ti;
 
 	if (unlikely(ret < 0))
@@ -2467,9 +2467,9 @@ int nilfs_clean_segments(struct super_block *sb, struct nilfs_argv *argv,
 		int ret = nilfs_discard_segments(nilfs, sci->sc_freesegs,
 						 sci->sc_nfreesegs);
 		if (ret) {
-			printk(KERN_WARNING
-			       "NILFS warning: error %d on discard request, "
-			       "turning discards off for the device\n", ret);
+			nilfs_msg(sb, KERN_WARNING,
+				  "error %d on discard request, turning discards off for the device",
+				  ret);
 			nilfs_clear_opt(nilfs, DISCARD);
 		}
 	}
@@ -2551,10 +2551,9 @@ static int nilfs_segctor_thread(void *arg)
 	/* start sync. */
 	sci->sc_task = current;
 	wake_up(&sci->sc_wait_task); /* for nilfs_segctor_start_thread() */
-	printk(KERN_INFO
-	       "segctord starting. Construction interval = %lu seconds, "
-	       "CP frequency < %lu seconds\n",
-	       sci->sc_interval / HZ, sci->sc_mjcp_freq / HZ);
+	nilfs_msg(sci->sc_super, KERN_INFO,
+		  "segctord starting. Construction interval = %lu seconds, CP frequency < %lu seconds",
+		  sci->sc_interval / HZ, sci->sc_mjcp_freq / HZ);
 
 	spin_lock(&sci->sc_state_lock);
  loop:
@@ -2628,8 +2627,8 @@ static int nilfs_segctor_start_thread(struct nilfs_sc_info *sci)
 	if (IS_ERR(t)) {
 		int err = PTR_ERR(t);
 
-		printk(KERN_ERR "NILFS: error %d creating segctord thread\n",
-		       err);
+		nilfs_msg(sci->sc_super, KERN_ERR,
+			  "error %d creating segctord thread", err);
 		return err;
 	}
 	wait_event(sci->sc_wait_task, sci->sc_task != NULL);
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 1963595a1580..5b495c469471 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -181,9 +181,9 @@ int nilfs_sufile_updatev(struct inode *sufile, __u64 *segnumv, size_t nsegs,
 	down_write(&NILFS_MDT(sufile)->mi_sem);
 	for (seg = segnumv; seg < segnumv + nsegs; seg++) {
 		if (unlikely(*seg >= nilfs_sufile_get_nsegments(sufile))) {
-			printk(KERN_WARNING
-			       "%s: invalid segment number: %llu\n", __func__,
-			       (unsigned long long)*seg);
+			nilfs_msg(sufile->i_sb, KERN_WARNING,
+				  "%s: invalid segment number: %llu",
+				  __func__, (unsigned long long)*seg);
 			nerr++;
 		}
 	}
@@ -240,8 +240,9 @@ int nilfs_sufile_update(struct inode *sufile, __u64 segnum, int create,
 	int ret;
 
 	if (unlikely(segnum >= nilfs_sufile_get_nsegments(sufile))) {
-		printk(KERN_WARNING "%s: invalid segment number: %llu\n",
-		       __func__, (unsigned long long)segnum);
+		nilfs_msg(sufile->i_sb, KERN_WARNING,
+			  "%s: invalid segment number: %llu",
+			  __func__, (unsigned long long)segnum);
 		return -EINVAL;
 	}
 	down_write(&NILFS_MDT(sufile)->mi_sem);
@@ -419,8 +420,9 @@ void nilfs_sufile_do_cancel_free(struct inode *sufile, __u64 segnum,
 	kaddr = kmap_atomic(su_bh->b_page);
 	su = nilfs_sufile_block_get_segment_usage(sufile, segnum, su_bh, kaddr);
 	if (unlikely(!nilfs_segment_usage_clean(su))) {
-		printk(KERN_WARNING "%s: segment %llu must be clean\n",
-		       __func__, (unsigned long long)segnum);
+		nilfs_msg(sufile->i_sb, KERN_WARNING,
+			  "%s: segment %llu must be clean", __func__,
+			  (unsigned long long)segnum);
 		kunmap_atomic(kaddr);
 		return;
 	}
@@ -476,8 +478,9 @@ void nilfs_sufile_do_free(struct inode *sufile, __u64 segnum,
 	kaddr = kmap_atomic(su_bh->b_page);
 	su = nilfs_sufile_block_get_segment_usage(sufile, segnum, su_bh, kaddr);
 	if (nilfs_segment_usage_clean(su)) {
-		printk(KERN_WARNING "%s: segment %llu is already clean\n",
-		       __func__, (unsigned long long)segnum);
+		nilfs_msg(sufile->i_sb, KERN_WARNING,
+			  "%s: segment %llu is already clean",
+			  __func__, (unsigned long long)segnum);
 		kunmap_atomic(kaddr);
 		return;
 	}
@@ -1175,14 +1178,12 @@ int nilfs_sufile_read(struct super_block *sb, size_t susize,
 	int err;
 
 	if (susize > sb->s_blocksize) {
-		printk(KERN_ERR
-		       "NILFS: too large segment usage size: %zu bytes.\n",
-		       susize);
+		nilfs_msg(sb, KERN_ERR,
+			  "too large segment usage size: %zu bytes", susize);
 		return -EINVAL;
 	} else if (susize < NILFS_MIN_SEGMENT_USAGE_SIZE) {
-		printk(KERN_ERR
-		       "NILFS: too small segment usage size: %zu bytes.\n",
-		       susize);
+		nilfs_msg(sb, KERN_ERR,
+			  "too small segment usage size: %zu bytes", susize);
 		return -EINVAL;
 	}
 
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 2d4d0bec711e..90c62b489857 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -214,8 +214,8 @@ static int nilfs_sync_super(struct super_block *sb, int flag)
 	}
 
 	if (unlikely(err)) {
-		printk(KERN_ERR
-		       "NILFS: unable to write superblock (err=%d)\n", err);
+		nilfs_msg(sb, KERN_ERR, "unable to write superblock: err=%d",
+			  err);
 		if (err == -EIO && nilfs->ns_sbh[1]) {
 			/*
 			 * sbp[0] points to newer log than sbp[1],
@@ -285,8 +285,7 @@ struct nilfs_super_block **nilfs_prepare_super(struct super_block *sb,
 		    sbp[1]->s_magic == cpu_to_le16(NILFS_SUPER_MAGIC)) {
 			memcpy(sbp[0], sbp[1], nilfs->ns_sbsize);
 		} else {
-			printk(KERN_CRIT "NILFS: superblock broke on dev %s\n",
-			       sb->s_id);
+			nilfs_msg(sb, KERN_CRIT, "superblock broke");
 			return NULL;
 		}
 	} else if (sbp[1] &&
@@ -396,9 +395,9 @@ static int nilfs_move_2nd_super(struct super_block *sb, loff_t sb2off)
 	offset = sb2off & (nilfs->ns_blocksize - 1);
 	nsbh = sb_getblk(sb, newblocknr);
 	if (!nsbh) {
-		printk(KERN_WARNING
-		       "NILFS warning: unable to move secondary superblock "
-		       "to block %llu\n", (unsigned long long)newblocknr);
+		nilfs_msg(sb, KERN_WARNING,
+			  "unable to move secondary superblock to block %llu",
+			  (unsigned long long)newblocknr);
 		ret = -EIO;
 		goto out;
 	}
@@ -561,10 +560,9 @@ int nilfs_attach_checkpoint(struct super_block *sb, __u64 cno, int curr_mnt,
 	up_read(&nilfs->ns_segctor_sem);
 	if (unlikely(err)) {
 		if (err == -ENOENT || err == -EINVAL) {
-			printk(KERN_ERR
-			       "NILFS: Invalid checkpoint "
-			       "(checkpoint number=%llu)\n",
-			       (unsigned long long)cno);
+			nilfs_msg(sb, KERN_ERR,
+				  "Invalid checkpoint (checkpoint number=%llu)",
+				  (unsigned long long)cno);
 			err = -EINVAL;
 		}
 		goto failed;
@@ -660,9 +658,8 @@ static int nilfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	err = nilfs_ifile_count_free_inodes(root->ifile,
 					    &nmaxinodes, &nfreeinodes);
 	if (unlikely(err)) {
-		printk(KERN_WARNING
-			"NILFS warning: fail to count free inodes: err %d.\n",
-			err);
+		nilfs_msg(sb, KERN_WARNING,
+			  "failed to count free inodes: err=%d", err);
 		if (err == -ERANGE) {
 			/*
 			 * If nilfs_palloc_count_max_entries() returns
@@ -794,9 +791,9 @@ static int parse_options(char *options, struct super_block *sb, int is_remount)
 			break;
 		case Opt_snapshot:
 			if (is_remount) {
-				printk(KERN_ERR
-				       "NILFS: \"%s\" option is invalid "
-				       "for remount.\n", p);
+				nilfs_msg(sb, KERN_ERR,
+					  "\"%s\" option is invalid for remount",
+					  p);
 				return 0;
 			}
 			break;
@@ -810,8 +807,8 @@ static int parse_options(char *options, struct super_block *sb, int is_remount)
 			nilfs_clear_opt(nilfs, DISCARD);
 			break;
 		default:
-			printk(KERN_ERR
-			       "NILFS: Unrecognized mount option \"%s\"\n", p);
+			nilfs_msg(sb, KERN_ERR,
+				  "unrecognized mount option \"%s\"", p);
 			return 0;
 		}
 	}
@@ -847,12 +844,10 @@ static int nilfs_setup_super(struct super_block *sb, int is_mount)
 	mnt_count = le16_to_cpu(sbp[0]->s_mnt_count);
 
 	if (nilfs->ns_mount_state & NILFS_ERROR_FS) {
-		printk(KERN_WARNING
-		       "NILFS warning: mounting fs with errors\n");
+		nilfs_msg(sb, KERN_WARNING, "mounting fs with errors");
 #if 0
 	} else if (max_mnt_count >= 0 && mnt_count >= max_mnt_count) {
-		printk(KERN_WARNING
-		       "NILFS warning: maximal mount count reached\n");
+		nilfs_msg(sb, KERN_WARNING, "maximal mount count reached");
 #endif
 	}
 	if (!max_mnt_count)
@@ -915,17 +910,17 @@ int nilfs_check_feature_compatibility(struct super_block *sb,
 	features = le64_to_cpu(sbp->s_feature_incompat) &
 		~NILFS_FEATURE_INCOMPAT_SUPP;
 	if (features) {
-		printk(KERN_ERR "NILFS: couldn't mount because of unsupported "
-		       "optional features (%llx)\n",
-		       (unsigned long long)features);
+		nilfs_msg(sb, KERN_ERR,
+			  "couldn't mount because of unsupported optional features (%llx)",
+			  (unsigned long long)features);
 		return -EINVAL;
 	}
 	features = le64_to_cpu(sbp->s_feature_compat_ro) &
 		~NILFS_FEATURE_COMPAT_RO_SUPP;
 	if (!(sb->s_flags & MS_RDONLY) && features) {
-		printk(KERN_ERR "NILFS: couldn't mount RDWR because of "
-		       "unsupported optional features (%llx)\n",
-		       (unsigned long long)features);
+		nilfs_msg(sb, KERN_ERR,
+			  "couldn't mount RDWR because of unsupported optional features (%llx)",
+			  (unsigned long long)features);
 		return -EINVAL;
 	}
 	return 0;
@@ -941,13 +936,13 @@ static int nilfs_get_root_dentry(struct super_block *sb,
 
 	inode = nilfs_iget(sb, root, NILFS_ROOT_INO);
 	if (IS_ERR(inode)) {
-		printk(KERN_ERR "NILFS: get root inode failed\n");
 		ret = PTR_ERR(inode);
+		nilfs_msg(sb, KERN_ERR, "error %d getting root inode", ret);
 		goto out;
 	}
 	if (!S_ISDIR(inode->i_mode) || !inode->i_blocks || !inode->i_size) {
 		iput(inode);
-		printk(KERN_ERR "NILFS: corrupt root inode.\n");
+		nilfs_msg(sb, KERN_ERR, "corrupt root inode");
 		ret = -EINVAL;
 		goto out;
 	}
@@ -975,7 +970,7 @@ static int nilfs_get_root_dentry(struct super_block *sb,
 	return ret;
 
  failed_dentry:
-	printk(KERN_ERR "NILFS: get root dentry failed\n");
+	nilfs_msg(sb, KERN_ERR, "error %d getting root dentry", ret);
 	goto out;
 }
 
@@ -995,18 +990,18 @@ static int nilfs_attach_snapshot(struct super_block *s, __u64 cno,
 		ret = (ret == -ENOENT) ? -EINVAL : ret;
 		goto out;
 	} else if (!ret) {
-		printk(KERN_ERR "NILFS: The specified checkpoint is "
-		       "not a snapshot (checkpoint number=%llu).\n",
-		       (unsigned long long)cno);
+		nilfs_msg(s, KERN_ERR,
+			  "The specified checkpoint is not a snapshot (checkpoint number=%llu)",
+			  (unsigned long long)cno);
 		ret = -EINVAL;
 		goto out;
 	}
 
 	ret = nilfs_attach_checkpoint(s, cno, false, &root);
 	if (ret) {
-		printk(KERN_ERR "NILFS: error loading snapshot "
-		       "(checkpoint number=%llu).\n",
-	       (unsigned long long)cno);
+		nilfs_msg(s, KERN_ERR,
+			  "error %d while loading snapshot (checkpoint number=%llu)",
+			  ret, (unsigned long long)cno);
 		goto out;
 	}
 	ret = nilfs_get_root_dentry(s, root, root_dentry);
@@ -1101,8 +1096,9 @@ nilfs_fill_super(struct super_block *sb, void *data, int silent)
 	cno = nilfs_last_cno(nilfs);
 	err = nilfs_attach_checkpoint(sb, cno, true, &fsroot);
 	if (err) {
-		printk(KERN_ERR "NILFS: error loading last checkpoint "
-		       "(checkpoint number=%llu).\n", (unsigned long long)cno);
+		nilfs_msg(sb, KERN_ERR,
+			  "error %d while loading last checkpoint (checkpoint number=%llu)",
+			  err, (unsigned long long)cno);
 		goto failed_unload;
 	}
 
@@ -1162,9 +1158,8 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
 	err = -EINVAL;
 
 	if (!nilfs_valid_fs(nilfs)) {
-		printk(KERN_WARNING "NILFS (device %s): couldn't "
-		       "remount because the filesystem is in an "
-		       "incomplete recovery state.\n", sb->s_id);
+		nilfs_msg(sb, KERN_WARNING,
+			  "couldn't remount because the filesystem is in an incomplete recovery state");
 		goto restore_opts;
 	}
 
@@ -1196,10 +1191,9 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
 			~NILFS_FEATURE_COMPAT_RO_SUPP;
 		up_read(&nilfs->ns_sem);
 		if (features) {
-			printk(KERN_WARNING "NILFS (device %s): couldn't "
-			       "remount RDWR because of unsupported optional "
-			       "features (%llx)\n",
-			       sb->s_id, (unsigned long long)features);
+			nilfs_msg(sb, KERN_WARNING,
+				  "couldn't remount RDWR because of unsupported optional features (%llx)",
+				  (unsigned long long)features);
 			err = -EROFS;
 			goto restore_opts;
 		}
@@ -1262,8 +1256,8 @@ static int nilfs_identify(char *data, struct nilfs_super_data *sd)
 				}
 			}
 			if (ret)
-				printk(KERN_ERR
-				       "NILFS: invalid mount option: %s\n", p);
+				nilfs_msg(NULL, KERN_ERR,
+					  "invalid mount option: %s", p);
 		}
 		if (!options)
 			break;
@@ -1344,10 +1338,10 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
 	} else if (!sd.cno) {
 		if (nilfs_tree_is_busy(s->s_root)) {
 			if ((flags ^ s->s_flags) & MS_RDONLY) {
-				printk(KERN_ERR "NILFS: the device already "
-				       "has a %s mount.\n",
-				       (s->s_flags & MS_RDONLY) ?
-				       "read-only" : "read/write");
+				nilfs_msg(s, KERN_ERR,
+					  "the device already has a %s mount.",
+					  (s->s_flags & MS_RDONLY) ?
+					  "read-only" : "read/write");
 				err = -EBUSY;
 				goto failed_super;
 			}
diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c
index 8ffa42b704d8..8e57bb91fe16 100644
--- a/fs/nilfs2/sysfs.c
+++ b/fs/nilfs2/sysfs.c
@@ -272,8 +272,8 @@ nilfs_checkpoints_checkpoints_number_show(struct nilfs_checkpoints_attr *attr,
 	err = nilfs_cpfile_get_stat(nilfs->ns_cpfile, &cpstat);
 	up_read(&nilfs->ns_segctor_sem);
 	if (err < 0) {
-		printk(KERN_ERR "NILFS: unable to get checkpoint stat: err=%d\n",
-			err);
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "unable to get checkpoint stat: err=%d", err);
 		return err;
 	}
 
@@ -295,8 +295,8 @@ nilfs_checkpoints_snapshots_number_show(struct nilfs_checkpoints_attr *attr,
 	err = nilfs_cpfile_get_stat(nilfs->ns_cpfile, &cpstat);
 	up_read(&nilfs->ns_segctor_sem);
 	if (err < 0) {
-		printk(KERN_ERR "NILFS: unable to get checkpoint stat: err=%d\n",
-			err);
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "unable to get checkpoint stat: err=%d", err);
 		return err;
 	}
 
@@ -414,8 +414,8 @@ nilfs_segments_dirty_segments_show(struct nilfs_segments_attr *attr,
 	err = nilfs_sufile_get_stat(nilfs->ns_sufile, &sustat);
 	up_read(&nilfs->ns_segctor_sem);
 	if (err < 0) {
-		printk(KERN_ERR "NILFS: unable to get segment stat: err=%d\n",
-			err);
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "unable to get segment stat: err=%d", err);
 		return err;
 	}
 
@@ -789,14 +789,15 @@ nilfs_superblock_sb_update_frequency_store(struct nilfs_superblock_attr *attr,
 
 	err = kstrtouint(skip_spaces(buf), 0, &val);
 	if (err) {
-		printk(KERN_ERR "NILFS: unable to convert string: err=%d\n",
-			err);
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "unable to convert string: err=%d", err);
 		return err;
 	}
 
 	if (val < NILFS_SB_FREQ) {
 		val = NILFS_SB_FREQ;
-		printk(KERN_WARNING "NILFS: superblock update frequency cannot be lesser than 10 seconds\n");
+		nilfs_msg(nilfs->ns_sb, KERN_WARNING,
+			  "superblock update frequency cannot be lesser than 10 seconds");
 	}
 
 	down_write(&nilfs->ns_sem);
@@ -999,7 +1000,8 @@ int nilfs_sysfs_create_device_group(struct super_block *sb)
 	nilfs->ns_dev_subgroups = kzalloc(devgrp_size, GFP_KERNEL);
 	if (unlikely(!nilfs->ns_dev_subgroups)) {
 		err = -ENOMEM;
-		printk(KERN_ERR "NILFS: unable to allocate memory for device group\n");
+		nilfs_msg(sb, KERN_ERR,
+			  "unable to allocate memory for device group");
 		goto failed_create_device_group;
 	}
 
@@ -1109,15 +1111,15 @@ int __init nilfs_sysfs_init(void)
 	nilfs_kset = kset_create_and_add(NILFS_ROOT_GROUP_NAME, NULL, fs_kobj);
 	if (!nilfs_kset) {
 		err = -ENOMEM;
-		printk(KERN_ERR "NILFS: unable to create sysfs entry: err %d\n",
-			err);
+		nilfs_msg(NULL, KERN_ERR,
+			  "unable to create sysfs entry: err=%d", err);
 		goto failed_sysfs_init;
 	}
 
 	err = sysfs_create_group(&nilfs_kset->kobj, &nilfs_feature_attr_group);
 	if (unlikely(err)) {
-		printk(KERN_ERR "NILFS: unable to create feature group: err %d\n",
-			err);
+		nilfs_msg(NULL, KERN_ERR,
+			  "unable to create feature group: err=%d", err);
 		goto cleanup_sysfs_init;
 	}
 
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 702115164cf3..2dd75bf619ad 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -192,7 +192,10 @@ static int nilfs_store_log_cursor(struct the_nilfs *nilfs,
 		nilfs_get_segnum_of_block(nilfs, nilfs->ns_last_pseg);
 	nilfs->ns_cno = nilfs->ns_last_cno + 1;
 	if (nilfs->ns_segnum >= nilfs->ns_nsegments) {
-		printk(KERN_ERR "NILFS invalid last segment number.\n");
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "pointed segment number is out of range: segnum=%llu, nsegments=%lu",
+			  (unsigned long long)nilfs->ns_segnum,
+			  nilfs->ns_nsegments);
 		ret = -EINVAL;
 	}
 	return ret;
@@ -216,12 +219,12 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
 	int err;
 
 	if (!valid_fs) {
-		printk(KERN_WARNING "NILFS warning: mounting unchecked fs\n");
+		nilfs_msg(sb, KERN_WARNING, "mounting unchecked fs");
 		if (s_flags & MS_RDONLY) {
-			printk(KERN_INFO "NILFS: INFO: recovery "
-			       "required for readonly filesystem.\n");
-			printk(KERN_INFO "NILFS: write access will "
-			       "be enabled during recovery.\n");
+			nilfs_msg(sb, KERN_INFO,
+				  "recovery required for readonly filesystem");
+			nilfs_msg(sb, KERN_INFO,
+				  "write access will be enabled during recovery");
 		}
 	}
 
@@ -236,13 +239,12 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
 			goto scan_error;
 
 		if (!nilfs_valid_sb(sbp[1])) {
-			printk(KERN_WARNING
-			       "NILFS warning: unable to fall back to spare"
-			       "super block\n");
+			nilfs_msg(sb, KERN_WARNING,
+				  "unable to fall back to spare super block");
 			goto scan_error;
 		}
-		printk(KERN_INFO
-		       "NILFS: try rollback from an earlier position\n");
+		nilfs_msg(sb, KERN_INFO,
+			  "trying rollback from an earlier position");
 
 		/*
 		 * restore super block with its spare and reconfigure
@@ -255,10 +257,9 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
 		/* verify consistency between two super blocks */
 		blocksize = BLOCK_SIZE << le32_to_cpu(sbp[0]->s_log_block_size);
 		if (blocksize != nilfs->ns_blocksize) {
-			printk(KERN_WARNING
-			       "NILFS warning: blocksize differs between "
-			       "two super blocks (%d != %d)\n",
-			       blocksize, nilfs->ns_blocksize);
+			nilfs_msg(sb, KERN_WARNING,
+				  "blocksize differs between two super blocks (%d != %d)",
+				  blocksize, nilfs->ns_blocksize);
 			goto scan_error;
 		}
 
@@ -277,7 +278,8 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
 
 	err = nilfs_load_super_root(nilfs, sb, ri.ri_super_root);
 	if (unlikely(err)) {
-		printk(KERN_ERR "NILFS: error loading super root.\n");
+		nilfs_msg(sb, KERN_ERR, "error %d while loading super root",
+			  err);
 		goto failed;
 	}
 
@@ -288,30 +290,29 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
 		__u64 features;
 
 		if (nilfs_test_opt(nilfs, NORECOVERY)) {
-			printk(KERN_INFO "NILFS: norecovery option specified. "
-			       "skipping roll-forward recovery\n");
+			nilfs_msg(sb, KERN_INFO,
+				  "norecovery option specified, skipping roll-forward recovery");
 			goto skip_recovery;
 		}
 		features = le64_to_cpu(nilfs->ns_sbp[0]->s_feature_compat_ro) &
 			~NILFS_FEATURE_COMPAT_RO_SUPP;
 		if (features) {
-			printk(KERN_ERR "NILFS: couldn't proceed with "
-			       "recovery because of unsupported optional "
-			       "features (%llx)\n",
-			       (unsigned long long)features);
+			nilfs_msg(sb, KERN_ERR,
+				  "couldn't proceed with recovery because of unsupported optional features (%llx)",
+				  (unsigned long long)features);
 			err = -EROFS;
 			goto failed_unload;
 		}
 		if (really_read_only) {
-			printk(KERN_ERR "NILFS: write access "
-			       "unavailable, cannot proceed.\n");
+			nilfs_msg(sb, KERN_ERR,
+				  "write access unavailable, cannot proceed");
 			err = -EROFS;
 			goto failed_unload;
 		}
 		sb->s_flags &= ~MS_RDONLY;
 	} else if (nilfs_test_opt(nilfs, NORECOVERY)) {
-		printk(KERN_ERR "NILFS: recovery cancelled because norecovery "
-		       "option was specified for a read/write mount\n");
+		nilfs_msg(sb, KERN_ERR,
+			  "recovery cancelled because norecovery option was specified for a read/write mount");
 		err = -EINVAL;
 		goto failed_unload;
 	}
@@ -326,11 +327,12 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
 	up_write(&nilfs->ns_sem);
 
 	if (err) {
-		printk(KERN_ERR "NILFS: failed to update super block. "
-		       "recovery unfinished.\n");
+		nilfs_msg(sb, KERN_ERR,
+			  "error %d updating super block. recovery unfinished.",
+			  err);
 		goto failed_unload;
 	}
-	printk(KERN_INFO "NILFS: recovery complete.\n");
+	nilfs_msg(sb, KERN_INFO, "recovery complete");
 
  skip_recovery:
 	nilfs_clear_recovery_info(&ri);
@@ -338,7 +340,7 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
 	return 0;
 
  scan_error:
-	printk(KERN_ERR "NILFS: error searching super root.\n");
+	nilfs_msg(sb, KERN_ERR, "error %d while searching super root", err);
 	goto failed;
 
  failed_unload:
@@ -385,12 +387,11 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
 				   struct nilfs_super_block *sbp)
 {
 	if (le32_to_cpu(sbp->s_rev_level) < NILFS_MIN_SUPP_REV) {
-		printk(KERN_ERR "NILFS: unsupported revision "
-		       "(superblock rev.=%d.%d, current rev.=%d.%d). "
-		       "Please check the version of mkfs.nilfs.\n",
-		       le32_to_cpu(sbp->s_rev_level),
-		       le16_to_cpu(sbp->s_minor_rev_level),
-		       NILFS_CURRENT_REV, NILFS_MINOR_REV);
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "unsupported revision (superblock rev.=%d.%d, current rev.=%d.%d). Please check the version of mkfs.nilfs(2).",
+			  le32_to_cpu(sbp->s_rev_level),
+			  le16_to_cpu(sbp->s_minor_rev_level),
+			  NILFS_CURRENT_REV, NILFS_MINOR_REV);
 		return -EINVAL;
 	}
 	nilfs->ns_sbsize = le16_to_cpu(sbp->s_bytes);
@@ -399,12 +400,14 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
 
 	nilfs->ns_inode_size = le16_to_cpu(sbp->s_inode_size);
 	if (nilfs->ns_inode_size > nilfs->ns_blocksize) {
-		printk(KERN_ERR "NILFS: too large inode size: %d bytes.\n",
-		       nilfs->ns_inode_size);
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "too large inode size: %d bytes",
+			  nilfs->ns_inode_size);
 		return -EINVAL;
 	} else if (nilfs->ns_inode_size < NILFS_MIN_INODE_SIZE) {
-		printk(KERN_ERR "NILFS: too small inode size: %d bytes.\n",
-		       nilfs->ns_inode_size);
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "too small inode size: %d bytes",
+			  nilfs->ns_inode_size);
 		return -EINVAL;
 	}
 
@@ -412,7 +415,9 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
 
 	nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment);
 	if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) {
-		printk(KERN_ERR "NILFS: too short segment.\n");
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "too short segment: %lu blocks",
+			  nilfs->ns_blocks_per_segment);
 		return -EINVAL;
 	}
 
@@ -421,7 +426,9 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
 		le32_to_cpu(sbp->s_r_segments_percentage);
 	if (nilfs->ns_r_segments_percentage < 1 ||
 	    nilfs->ns_r_segments_percentage > 99) {
-		printk(KERN_ERR "NILFS: invalid reserved segments percentage.\n");
+		nilfs_msg(nilfs->ns_sb, KERN_ERR,
+			  "invalid reserved segments percentage: %lu",
+			  nilfs->ns_r_segments_percentage);
 		return -EINVAL;
 	}
 
@@ -505,16 +512,16 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
 
 	if (!sbp[0]) {
 		if (!sbp[1]) {
-			printk(KERN_ERR "NILFS: unable to read superblock\n");
+			nilfs_msg(sb, KERN_ERR, "unable to read superblock");
 			return -EIO;
 		}
-		printk(KERN_WARNING
-		       "NILFS warning: unable to read primary superblock "
-		       "(blocksize = %d)\n", blocksize);
+		nilfs_msg(sb, KERN_WARNING,
+			  "unable to read primary superblock (blocksize = %d)",
+			  blocksize);
 	} else if (!sbp[1]) {
-		printk(KERN_WARNING
-		       "NILFS warning: unable to read secondary superblock "
-		       "(blocksize = %d)\n", blocksize);
+		nilfs_msg(sb, KERN_WARNING,
+			  "unable to read secondary superblock (blocksize = %d)",
+			  blocksize);
 	}
 
 	/*
@@ -536,14 +543,14 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
 	}
 	if (!valid[swp]) {
 		nilfs_release_super_block(nilfs);
-		printk(KERN_ERR "NILFS: Can't find nilfs on dev %s.\n",
-		       sb->s_id);
+		nilfs_msg(sb, KERN_ERR, "couldn't find nilfs on the device");
 		return -EINVAL;
 	}
 
 	if (!valid[!swp])
-		printk(KERN_WARNING "NILFS warning: broken superblock. "
-		       "using spare superblock (blocksize = %d).\n", blocksize);
+		nilfs_msg(sb, KERN_WARNING,
+			  "broken superblock, retrying with spare superblock (blocksize = %d)",
+			  blocksize);
 	if (swp)
 		nilfs_swap_super_block(nilfs);
 
@@ -577,7 +584,7 @@ int init_nilfs(struct the_nilfs *nilfs, struct super_block *sb, char *data)
 
 	blocksize = sb_min_blocksize(sb, NILFS_MIN_BLOCK_SIZE);
 	if (!blocksize) {
-		printk(KERN_ERR "NILFS: unable to set blocksize\n");
+		nilfs_msg(sb, KERN_ERR, "unable to set blocksize");
 		err = -EINVAL;
 		goto out;
 	}
@@ -596,8 +603,9 @@ int init_nilfs(struct the_nilfs *nilfs, struct super_block *sb, char *data)
 	blocksize = BLOCK_SIZE << le32_to_cpu(sbp->s_log_block_size);
 	if (blocksize < NILFS_MIN_BLOCK_SIZE ||
 	    blocksize > NILFS_MAX_BLOCK_SIZE) {
-		printk(KERN_ERR "NILFS: couldn't mount because of unsupported "
-		       "filesystem blocksize %d\n", blocksize);
+		nilfs_msg(sb, KERN_ERR,
+			  "couldn't mount because of unsupported filesystem blocksize %d",
+			  blocksize);
 		err = -EINVAL;
 		goto failed_sbh;
 	}
@@ -605,10 +613,9 @@ int init_nilfs(struct the_nilfs *nilfs, struct super_block *sb, char *data)
 		int hw_blocksize = bdev_logical_block_size(sb->s_bdev);
 
 		if (blocksize < hw_blocksize) {
-			printk(KERN_ERR
-			       "NILFS: blocksize %d too small for device "
-			       "(sector-size = %d).\n",
-			       blocksize, hw_blocksize);
+			nilfs_msg(sb, KERN_ERR,
+				  "blocksize %d too small for device (sector-size = %d)",
+				  blocksize, hw_blocksize);
 			err = -EINVAL;
 			goto failed_sbh;
 		}

From d6517deb014954d3229910e46f3b85b7ad80db3e Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:14 -0700
Subject: [PATCH 064/111] nilfs2: replace nilfs_warning() with nilfs_msg()

Use nilfs_msg() to output warning messages and get rid of
nilfs_warning() function.  This also removes function names from the
messages unless we embed them explicitly in format strings.  Instead,
some messages are revised to clarify the context.

[arnd@arndb.de: avoid warning about unused variables]
  Link: http://lkml.kernel.org/r/20160615201945.3348205-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1464875891-5443-6-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/alloc.c   | 45 +++++++++++++++++++++------------------------
 fs/nilfs2/ifile.c   |  4 ++--
 fs/nilfs2/inode.c   | 21 +++++++++++----------
 fs/nilfs2/namei.c   |  6 +++---
 fs/nilfs2/nilfs.h   |  7 ++++---
 fs/nilfs2/page.c    | 19 +++++++++----------
 fs/nilfs2/segment.c | 20 ++++++++++----------
 fs/nilfs2/super.c   | 21 +--------------------
 8 files changed, 61 insertions(+), 82 deletions(-)

diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 1a85d94f5b25..2c90e285d7c6 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -622,10 +622,10 @@ void nilfs_palloc_commit_free_entry(struct inode *inode,
 	lock = nilfs_mdt_bgl_lock(inode, group);
 
 	if (!nilfs_clear_bit_atomic(lock, group_offset, bitmap))
-		nilfs_warning(inode->i_sb, __func__,
-			      "entry number %llu already freed: ino=%lu",
-			      (unsigned long long)req->pr_entry_nr,
-			      (unsigned long)inode->i_ino);
+		nilfs_msg(inode->i_sb, KERN_WARNING,
+			  "%s (ino=%lu): entry number %llu already freed",
+			  __func__, inode->i_ino,
+			  (unsigned long long)req->pr_entry_nr);
 	else
 		nilfs_palloc_group_desc_add_entries(desc, lock, 1);
 
@@ -663,10 +663,10 @@ void nilfs_palloc_abort_alloc_entry(struct inode *inode,
 	lock = nilfs_mdt_bgl_lock(inode, group);
 
 	if (!nilfs_clear_bit_atomic(lock, group_offset, bitmap))
-		nilfs_warning(inode->i_sb, __func__,
-			      "entry number %llu already freed: ino=%lu",
-			      (unsigned long long)req->pr_entry_nr,
-			      (unsigned long)inode->i_ino);
+		nilfs_msg(inode->i_sb, KERN_WARNING,
+			  "%s (ino=%lu): entry number %llu already freed",
+			  __func__, inode->i_ino,
+			  (unsigned long long)req->pr_entry_nr);
 	else
 		nilfs_palloc_group_desc_add_entries(desc, lock, 1);
 
@@ -772,10 +772,10 @@ int nilfs_palloc_freev(struct inode *inode, __u64 *entry_nrs, size_t nitems)
 		do {
 			if (!nilfs_clear_bit_atomic(lock, group_offset,
 						    bitmap)) {
-				nilfs_warning(inode->i_sb, __func__,
-					      "entry number %llu already freed: ino=%lu",
-					      (unsigned long long)entry_nrs[j],
-					      (unsigned long)inode->i_ino);
+				nilfs_msg(inode->i_sb, KERN_WARNING,
+					  "%s (ino=%lu): entry number %llu already freed",
+					  __func__, inode->i_ino,
+					  (unsigned long long)entry_nrs[j]);
 			} else {
 				n++;
 			}
@@ -816,12 +816,11 @@ int nilfs_palloc_freev(struct inode *inode, __u64 *entry_nrs, size_t nitems)
 		for (k = 0; k < nempties; k++) {
 			ret = nilfs_palloc_delete_entry_block(inode,
 							      last_nrs[k]);
-			if (ret && ret != -ENOENT) {
-				nilfs_warning(inode->i_sb, __func__,
-					      "failed to delete block of entry %llu: ino=%lu, err=%d",
-					      (unsigned long long)last_nrs[k],
-					      (unsigned long)inode->i_ino, ret);
-			}
+			if (ret && ret != -ENOENT)
+				nilfs_msg(inode->i_sb, KERN_WARNING,
+					  "error %d deleting block that object (entry=%llu, ino=%lu) belongs to",
+					  ret, (unsigned long long)last_nrs[k],
+					  inode->i_ino);
 		}
 
 		desc_kaddr = kmap_atomic(desc_bh->b_page);
@@ -835,12 +834,10 @@ int nilfs_palloc_freev(struct inode *inode, __u64 *entry_nrs, size_t nitems)
 
 		if (nfree == nilfs_palloc_entries_per_group(inode)) {
 			ret = nilfs_palloc_delete_bitmap_block(inode, group);
-			if (ret && ret != -ENOENT) {
-				nilfs_warning(inode->i_sb, __func__,
-					      "failed to delete bitmap block of group %lu: ino=%lu, err=%d",
-					      group,
-					      (unsigned long)inode->i_ino, ret);
-			}
+			if (ret && ret != -ENOENT)
+				nilfs_msg(inode->i_sb, KERN_WARNING,
+					  "error %d deleting bitmap block of group=%lu, ino=%lu",
+					  ret, group, inode->i_ino);
 		}
 	}
 	return 0;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index b1c96285aa4a..b8fa45c20c63 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -151,8 +151,8 @@ int nilfs_ifile_get_inode_block(struct inode *ifile, ino_t ino,
 
 	err = nilfs_palloc_get_entry_block(ifile, ino, 0, out_bh);
 	if (unlikely(err))
-		nilfs_warning(sb, __func__, "unable to read inode: %lu",
-			      (unsigned long) ino);
+		nilfs_msg(sb, KERN_WARNING, "error %d reading inode: ino=%lu",
+			  err, (unsigned long)ino);
 	return err;
 }
 
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index a965fcf77955..b286b35174a5 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -723,9 +723,9 @@ repeat:
 		goto repeat;
 
 failed:
-	nilfs_warning(ii->vfs_inode.i_sb, __func__,
-		      "failed to truncate bmap (ino=%lu, err=%d)",
-		      ii->vfs_inode.i_ino, ret);
+	nilfs_msg(ii->vfs_inode.i_sb, KERN_WARNING,
+		  "error %d truncating bmap (ino=%lu)", ret,
+		  ii->vfs_inode.i_ino);
 }
 
 void nilfs_truncate(struct inode *inode)
@@ -936,9 +936,9 @@ int nilfs_set_file_dirty(struct inode *inode, unsigned int nr_dirty)
 			 * This will happen when somebody is freeing
 			 * this inode.
 			 */
-			nilfs_warning(inode->i_sb, __func__,
-				      "cannot get inode (ino=%lu)",
-				      inode->i_ino);
+			nilfs_msg(inode->i_sb, KERN_WARNING,
+				  "cannot set file dirty (ino=%lu): the file is being freed",
+				  inode->i_ino);
 			spin_unlock(&nilfs->ns_inode_lock);
 			return -EINVAL; /*
 					 * NILFS_I_DIRTY may remain for
@@ -959,8 +959,9 @@ int __nilfs_mark_inode_dirty(struct inode *inode, int flags)
 
 	err = nilfs_load_inode_block(inode, &ibh);
 	if (unlikely(err)) {
-		nilfs_warning(inode->i_sb, __func__,
-			      "failed to reget inode block.");
+		nilfs_msg(inode->i_sb, KERN_WARNING,
+			  "cannot mark inode dirty (ino=%lu): error %d loading inode block",
+			  inode->i_ino, err);
 		return err;
 	}
 	nilfs_update_inode(inode, ibh, flags);
@@ -986,8 +987,8 @@ void nilfs_dirty_inode(struct inode *inode, int flags)
 	struct nilfs_mdt_info *mdi = NILFS_MDT(inode);
 
 	if (is_bad_inode(inode)) {
-		nilfs_warning(inode->i_sb, __func__,
-			      "tried to mark bad_inode dirty. ignored.");
+		nilfs_msg(inode->i_sb, KERN_WARNING,
+			  "tried to mark bad_inode dirty. ignored.");
 		dump_stack();
 		return;
 	}
diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
index 1ec8ae5995a5..dbcf1dc93a51 100644
--- a/fs/nilfs2/namei.c
+++ b/fs/nilfs2/namei.c
@@ -283,9 +283,9 @@ static int nilfs_do_unlink(struct inode *dir, struct dentry *dentry)
 		goto out;
 
 	if (!inode->i_nlink) {
-		nilfs_warning(inode->i_sb, __func__,
-			      "deleting nonexistent file (%lu), %d",
-			      inode->i_ino, inode->i_nlink);
+		nilfs_msg(inode->i_sb, KERN_WARNING,
+			  "deleting nonexistent file (ino=%lu), %d",
+			  inode->i_ino, inode->i_nlink);
 		set_nlink(inode, 1);
 	}
 	err = nilfs_delete_entry(de, page);
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index b57ce41e8a1a..46fbd4e00315 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -306,8 +306,6 @@ void __nilfs_msg(struct super_block *sb, const char *level,
 extern __printf(3, 4)
 void __nilfs_error(struct super_block *sb, const char *function,
 		   const char *fmt, ...);
-extern __printf(3, 4)
-void nilfs_warning(struct super_block *, const char *, const char *, ...);
 
 #ifdef CONFIG_PRINTK
 
@@ -319,7 +317,10 @@ void nilfs_warning(struct super_block *, const char *, const char *, ...);
 #else
 
 #define nilfs_msg(sb, level, fmt, ...)					\
-	no_printk(fmt, ##__VA_ARGS__)
+	do {								\
+		no_printk(fmt, ##__VA_ARGS__);				\
+		(void)(sb);						\
+	} while (0)
 #define nilfs_error(sb, fmt, ...)					\
 	do {								\
 		no_printk(fmt, ##__VA_ARGS__);				\
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index d97ba5f11b77..eaccf12c296e 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -403,11 +403,10 @@ void nilfs_clear_dirty_page(struct page *page, bool silent)
 
 	BUG_ON(!PageLocked(page));
 
-	if (!silent) {
-		nilfs_warning(sb, __func__,
-				"discard page: offset %lld, ino %lu",
-				page_offset(page), inode->i_ino);
-	}
+	if (!silent)
+		nilfs_msg(sb, KERN_WARNING,
+			  "discard dirty page: offset=%lld, ino=%lu",
+			  page_offset(page), inode->i_ino);
 
 	ClearPageUptodate(page);
 	ClearPageMappedToDisk(page);
@@ -422,11 +421,11 @@ void nilfs_clear_dirty_page(struct page *page, bool silent)
 		bh = head = page_buffers(page);
 		do {
 			lock_buffer(bh);
-			if (!silent) {
-				nilfs_warning(sb, __func__,
-					"discard block %llu, size %zu",
-					(u64)bh->b_blocknr, bh->b_size);
-			}
+			if (!silent)
+				nilfs_msg(sb, KERN_WARNING,
+					  "discard dirty block: blocknr=%llu, size=%zu",
+					  (u64)bh->b_blocknr, bh->b_size);
+
 			set_mask_bits(&bh->b_state, clear_bits, 0);
 			unlock_buffer(bh);
 		} while (bh = bh->b_this_page, bh != head);
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 1cc968502e53..7e1864c6035b 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1951,8 +1951,9 @@ static int nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci,
 			err = nilfs_ifile_get_inode_block(
 				ifile, ii->vfs_inode.i_ino, &ibh);
 			if (unlikely(err)) {
-				nilfs_warning(sci->sc_super, __func__,
-					      "failed to get inode block.");
+				nilfs_msg(sci->sc_super, KERN_WARNING,
+					  "log writer: error %d getting inode block (ino=%lu)",
+					  err, ii->vfs_inode.i_ino);
 				return err;
 			}
 			mark_buffer_dirty(ibh);
@@ -2458,8 +2459,7 @@ int nilfs_clean_segments(struct super_block *sb, struct nilfs_argv *argv,
 		if (likely(!err))
 			break;
 
-		nilfs_warning(sb, __func__,
-			      "segment construction failed. (err=%d)", err);
+		nilfs_msg(sb, KERN_WARNING, "error %d cleaning segments", err);
 		set_current_state(TASK_INTERRUPTIBLE);
 		schedule_timeout(sci->sc_interval);
 	}
@@ -2738,14 +2738,14 @@ static void nilfs_segctor_destroy(struct nilfs_sc_info *sci)
 		nilfs_segctor_write_out(sci);
 
 	if (!list_empty(&sci->sc_dirty_files)) {
-		nilfs_warning(sci->sc_super, __func__,
-			      "dirty file(s) after the final construction");
+		nilfs_msg(sci->sc_super, KERN_WARNING,
+			  "disposed unprocessed dirty file(s) when stopping log writer");
 		nilfs_dispose_list(nilfs, &sci->sc_dirty_files, 1);
 	}
 
 	if (!list_empty(&sci->sc_iput_queue)) {
-		nilfs_warning(sci->sc_super, __func__,
-			      "iput queue is not empty");
+		nilfs_msg(sci->sc_super, KERN_WARNING,
+			  "disposed unprocessed inode(s) in iput queue when stopping log writer");
 		nilfs_dispose_list(nilfs, &sci->sc_iput_queue, 1);
 	}
 
@@ -2821,8 +2821,8 @@ void nilfs_detach_log_writer(struct super_block *sb)
 	spin_lock(&nilfs->ns_inode_lock);
 	if (!list_empty(&nilfs->ns_dirty_files)) {
 		list_splice_init(&nilfs->ns_dirty_files, &garbage_list);
-		nilfs_warning(sb, __func__,
-			      "Hit dirty file after stopped log writer");
+		nilfs_msg(sb, KERN_WARNING,
+			  "disposed unprocessed dirty file(s) when detaching log writer");
 	}
 	spin_unlock(&nilfs->ns_inode_lock);
 	up_write(&nilfs->ns_segctor_sem);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 90c62b489857..33ba6f78de69 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -115,8 +115,7 @@ static void nilfs_set_error(struct super_block *sb)
  *
  * This implements the body of nilfs_error() macro.  Normally,
  * nilfs_error() should be used.  As for sustainable errors such as a
- * single-shot I/O error, nilfs_warning() or printk() should be used
- * instead.
+ * single-shot I/O error, nilfs_msg() should be used instead.
  *
  * Callers should not add a trailing newline since this will do it.
  */
@@ -151,24 +150,6 @@ void __nilfs_error(struct super_block *sb, const char *function,
 		      sb->s_id);
 }
 
-void nilfs_warning(struct super_block *sb, const char *function,
-		   const char *fmt, ...)
-{
-	struct va_format vaf;
-	va_list args;
-
-	va_start(args, fmt);
-
-	vaf.fmt = fmt;
-	vaf.va = &args;
-
-	printk(KERN_WARNING "NILFS warning (device %s): %s: %pV\n",
-	       sb->s_id, function, &vaf);
-
-	va_end(args);
-}
-
-
 struct inode *nilfs_alloc_inode(struct super_block *sb)
 {
 	struct nilfs_inode_info *ii;

From 39a9dcca61a3d1375b9440676cbfc541804cd217 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:17 -0700
Subject: [PATCH 065/111] nilfs2: emit error message when I/O error is detected

When nilfs returned -EIO as an error code, it's not always clear if it
came from the underlying block device or not.  This will mend the issue
by having low level I/O routines of nilfs output an error message when
they detected an I/O error.

Link: http://lkml.kernel.org/r/1464875891-5443-7-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/btree.c   | 3 +++
 fs/nilfs2/gcinode.c | 9 ++++++++-
 fs/nilfs2/mdt.c     | 6 +++++-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index 2c52693a69a4..2e315f9f2e51 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -517,6 +517,9 @@ static int __nilfs_btree_get_block(const struct nilfs_bmap *btree, __u64 ptr,
 
  out_no_wait:
 	if (!buffer_uptodate(bh)) {
+		nilfs_msg(btree->b_inode->i_sb, KERN_ERR,
+			  "I/O error reading b-tree node block (ino=%lu, blocknr=%llu)",
+			  btree->b_inode->i_ino, (unsigned long long)ptr);
 		brelse(bh);
 		return -EIO;
 	}
diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
index e9148f94d696..853a831dcde0 100644
--- a/fs/nilfs2/gcinode.c
+++ b/fs/nilfs2/gcinode.c
@@ -148,8 +148,15 @@ int nilfs_gccache_submit_read_node(struct inode *inode, sector_t pbn,
 int nilfs_gccache_wait_and_mark_dirty(struct buffer_head *bh)
 {
 	wait_on_buffer(bh);
-	if (!buffer_uptodate(bh))
+	if (!buffer_uptodate(bh)) {
+		struct inode *inode = bh->b_page->mapping->host;
+
+		nilfs_msg(inode->i_sb, KERN_ERR,
+			  "I/O error reading %s block for GC (ino=%lu, vblocknr=%llu)",
+			  buffer_nilfs_node(bh) ? "node" : "data",
+			  inode->i_ino, (unsigned long long)bh->b_blocknr);
 		return -EIO;
+	}
 	if (buffer_dirty(bh))
 		return -EEXIST;
 
diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index 0d7b71fbeff8..d56d3a5bea88 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -207,8 +207,12 @@ static int nilfs_mdt_read_block(struct inode *inode, unsigned long block,
 
  out_no_wait:
 	err = -EIO;
-	if (!buffer_uptodate(first_bh))
+	if (!buffer_uptodate(first_bh)) {
+		nilfs_msg(inode->i_sb, KERN_ERR,
+			  "I/O error reading meta-data file (ino=%lu, block-offset=%lu)",
+			  inode->i_ino, block);
 		goto failed_bh;
+	}
  out:
 	*out_bh = first_bh;
 	return 0;

From aceb4170bb2ba88c5327cc69b9c91a708c7f7046 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:19 -0700
Subject: [PATCH 066/111] nilfs2: do not use yield()

Use cond_resched() instead of yield() in the loop of
nilfs_transaction_lock() since the usage corresponds to the "be nice for
others" case that the comment of yield() says.

This removes the following checkpatch.pl warning:

 "WARNING: Using yield() is generally wrong. See yield() kernel-doc
  (sched/core.c)"

Link: http://lkml.kernel.org/r/1464875891-5443-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/segment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 7e1864c6035b..5a97282aa074 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -373,7 +373,7 @@ static void nilfs_transaction_lock(struct super_block *sb,
 		nilfs_segctor_do_immediate_flush(sci);
 
 		up_write(&nilfs->ns_segctor_sem);
-		yield();
+		cond_resched();
 	}
 	if (gcflag)
 		ti->ti_flags |= NILFS_TI_GC;

From a7d3f104da57eecb2b9881127d6bdf9abe7fde99 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:22 -0700
Subject: [PATCH 067/111] nilfs2: refactor parser of snapshot mount option

Move parser of snapshot mount option to a separate function
nilfs_parse_snapshot_option(), replace simple_strtoull() with
kstrtoull() to avoid checkpatch.pl warning "WARNING: simple_strtoull is
obsolete, use kstrtoull instead", and refine the error message of the
parser.

Link: http://lkml.kernel.org/r/1464875891-5443-9-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/super.c | 53 +++++++++++++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 33ba6f78de69..c95d369e90aa 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1205,6 +1205,38 @@ struct nilfs_super_data {
 	int flags;
 };
 
+static int nilfs_parse_snapshot_option(const char *option,
+				       const substring_t *arg,
+				       struct nilfs_super_data *sd)
+{
+	unsigned long long val;
+	const char *msg = NULL;
+	int err;
+
+	if (!(sd->flags & MS_RDONLY)) {
+		msg = "read-only option is not specified";
+		goto parse_error;
+	}
+
+	err = kstrtoull(arg->from, 0, &val);
+	if (err) {
+		if (err == -ERANGE)
+			msg = "too large checkpoint number";
+		else
+			msg = "malformed argument";
+		goto parse_error;
+	} else if (val == 0) {
+		msg = "invalid checkpoint number 0";
+		goto parse_error;
+	}
+	sd->cno = val;
+	return 0;
+
+parse_error:
+	nilfs_msg(NULL, KERN_ERR, "invalid option \"%s\": %s", option, msg);
+	return 1;
+}
+
 /**
  * nilfs_identify - pre-read mount options needed to identify mount instance
  * @data: mount options
@@ -1221,24 +1253,9 @@ static int nilfs_identify(char *data, struct nilfs_super_data *sd)
 		p = strsep(&options, ",");
 		if (p != NULL && *p) {
 			token = match_token(p, tokens, args);
-			if (token == Opt_snapshot) {
-				if (!(sd->flags & MS_RDONLY)) {
-					ret++;
-				} else {
-					sd->cno = simple_strtoull(args[0].from,
-								  NULL, 0);
-					/*
-					 * No need to see the end pointer;
-					 * match_token() has done syntax
-					 * checking.
-					 */
-					if (sd->cno == 0)
-						ret++;
-				}
-			}
-			if (ret)
-				nilfs_msg(NULL, KERN_ERR,
-					  "invalid mount option: %s", p);
+			if (token == Opt_snapshot)
+				ret = nilfs_parse_snapshot_option(p, &args[0],
+								  sd);
 		}
 		if (!options)
 			break;

From ad980c9ab77cc4e0edc5dd3361fd69daabeb99f9 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:25 -0700
Subject: [PATCH 068/111] nilfs2: fix misuse of a semaphore in sysfs code

Variables ns_seg_seq, ns_segnum, ns_nextnum, ns_pseg_offset, ns_cno,
ns_ctime, ns_nongc_ctime, and ns_ndirtyblks, are protected by
ns_segctor_sem, but ns_sem is wrongly used by the nilfs sysfs code when
reading these variables.  This fixes the misuse and clarifies which
semaphore protects them in the comment of the_nilfs struct.

Link: http://lkml.kernel.org/r/1465825507-3407-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/sysfs.c     | 44 +++++++++++++++++++++----------------------
 fs/nilfs2/the_nilfs.h |  7 ++-----
 2 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/fs/nilfs2/sysfs.c b/fs/nilfs2/sysfs.c
index 8e57bb91fe16..490303e3d517 100644
--- a/fs/nilfs2/sysfs.c
+++ b/fs/nilfs2/sysfs.c
@@ -326,9 +326,9 @@ nilfs_checkpoints_next_checkpoint_show(struct nilfs_checkpoints_attr *attr,
 {
 	__u64 cno;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	cno = nilfs->ns_cno;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return snprintf(buf, PAGE_SIZE, "%llu\n", cno);
 }
@@ -511,9 +511,9 @@ nilfs_segctor_current_seg_sequence_show(struct nilfs_segctor_attr *attr,
 {
 	u64 seg_seq;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	seg_seq = nilfs->ns_seg_seq;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return snprintf(buf, PAGE_SIZE, "%llu\n", seg_seq);
 }
@@ -525,9 +525,9 @@ nilfs_segctor_current_last_full_seg_show(struct nilfs_segctor_attr *attr,
 {
 	__u64 segnum;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	segnum = nilfs->ns_segnum;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return snprintf(buf, PAGE_SIZE, "%llu\n", segnum);
 }
@@ -539,9 +539,9 @@ nilfs_segctor_next_full_seg_show(struct nilfs_segctor_attr *attr,
 {
 	__u64 nextnum;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	nextnum = nilfs->ns_nextnum;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return snprintf(buf, PAGE_SIZE, "%llu\n", nextnum);
 }
@@ -553,9 +553,9 @@ nilfs_segctor_next_pseg_offset_show(struct nilfs_segctor_attr *attr,
 {
 	unsigned long pseg_offset;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	pseg_offset = nilfs->ns_pseg_offset;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return snprintf(buf, PAGE_SIZE, "%lu\n", pseg_offset);
 }
@@ -567,9 +567,9 @@ nilfs_segctor_next_checkpoint_show(struct nilfs_segctor_attr *attr,
 {
 	__u64 cno;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	cno = nilfs->ns_cno;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return snprintf(buf, PAGE_SIZE, "%llu\n", cno);
 }
@@ -581,9 +581,9 @@ nilfs_segctor_last_seg_write_time_show(struct nilfs_segctor_attr *attr,
 {
 	time_t ctime;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	ctime = nilfs->ns_ctime;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return NILFS_SHOW_TIME(ctime, buf);
 }
@@ -595,9 +595,9 @@ nilfs_segctor_last_seg_write_time_secs_show(struct nilfs_segctor_attr *attr,
 {
 	time_t ctime;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	ctime = nilfs->ns_ctime;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return snprintf(buf, PAGE_SIZE, "%llu\n", (unsigned long long)ctime);
 }
@@ -609,9 +609,9 @@ nilfs_segctor_last_nongc_write_time_show(struct nilfs_segctor_attr *attr,
 {
 	time_t nongc_ctime;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	nongc_ctime = nilfs->ns_nongc_ctime;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return NILFS_SHOW_TIME(nongc_ctime, buf);
 }
@@ -623,9 +623,9 @@ nilfs_segctor_last_nongc_write_time_secs_show(struct nilfs_segctor_attr *attr,
 {
 	time_t nongc_ctime;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	nongc_ctime = nilfs->ns_nongc_ctime;
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return snprintf(buf, PAGE_SIZE, "%llu\n",
 			(unsigned long long)nongc_ctime);
@@ -638,9 +638,9 @@ nilfs_segctor_dirty_data_blocks_count_show(struct nilfs_segctor_attr *attr,
 {
 	u32 ndirtyblks;
 
-	down_read(&nilfs->ns_sem);
+	down_read(&nilfs->ns_segctor_sem);
 	ndirtyblks = atomic_read(&nilfs->ns_ndirtyblks);
-	up_read(&nilfs->ns_sem);
+	up_read(&nilfs->ns_segctor_sem);
 
 	return snprintf(buf, PAGE_SIZE, "%u\n", ndirtyblks);
 }
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 79d1421896d0..b305c6f033e7 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -122,11 +122,8 @@ struct the_nilfs {
 	unsigned int		ns_sb_update_freq;
 
 	/*
-	 * Following fields are dedicated to a writable FS-instance.
-	 * Except for the period seeking checkpoint, code outside the segment
-	 * constructor must lock a segment semaphore while accessing these
-	 * fields.
-	 * The writable FS-instance is sole during a lifetime of the_nilfs.
+	 * The following fields are updated by a writable FS-instance.
+	 * These fields are protected by ns_segctor_sem outside load_nilfs().
 	 */
 	u64			ns_seg_seq;
 	__u64			ns_segnum;

From 4ce5c3426cbe9193f82345fb103e17dc3335eb4f Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:28 -0700
Subject: [PATCH 069/111] nilfs2: use BIT() macro

Replace bit shifts by BIT macro for clarity.

Link: http://lkml.kernel.org/r/1465825507-3407-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/nilfs2/btnode.c  |  4 ++--
 fs/nilfs2/inode.c   |  4 ++--
 fs/nilfs2/nilfs.h   | 15 +++++++--------
 fs/nilfs2/page.c    | 26 +++++++++++++-------------
 fs/nilfs2/segment.c | 14 +++++++-------
 fs/nilfs2/sufile.c  | 12 ++++++------
 6 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 4cca998ec7a0..d5c23da43513 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -41,7 +41,7 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
 	struct inode *inode = NILFS_BTNC_I(btnc);
 	struct buffer_head *bh;
 
-	bh = nilfs_grab_buffer(inode, btnc, blocknr, 1 << BH_NILFS_Node);
+	bh = nilfs_grab_buffer(inode, btnc, blocknr, BIT(BH_NILFS_Node));
 	if (unlikely(!bh))
 		return NULL;
 
@@ -70,7 +70,7 @@ int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
 	struct page *page;
 	int err;
 
-	bh = nilfs_grab_buffer(inode, btnc, blocknr, 1 << BH_NILFS_Node);
+	bh = nilfs_grab_buffer(inode, btnc, blocknr, BIT(BH_NILFS_Node));
 	if (unlikely(!bh))
 		return -ENOMEM;
 
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index b286b35174a5..af04f553d7c9 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -356,7 +356,7 @@ struct inode *nilfs_new_inode(struct inode *dir, umode_t mode)
 
 	root = NILFS_I(dir)->i_root;
 	ii = NILFS_I(inode);
-	ii->i_state = 1 << NILFS_I_NEW;
+	ii->i_state = BIT(NILFS_I_NEW);
 	ii->i_root = root;
 
 	err = nilfs_ifile_create_inode(root->ifile, &ino, &ii->i_bh);
@@ -555,7 +555,7 @@ static int nilfs_iget_set(struct inode *inode, void *opaque)
 
 	inode->i_ino = args->ino;
 	if (args->for_gc) {
-		NILFS_I(inode)->i_state = 1 << NILFS_I_GCINODE;
+		NILFS_I(inode)->i_state = BIT(NILFS_I_GCINODE);
 		NILFS_I(inode)->i_cno = args->cno;
 		NILFS_I(inode)->i_root = NULL;
 	} else {
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 46fbd4e00315..2ba8a146af1f 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -119,20 +119,19 @@ enum {
 /*
  * Macros to check inode numbers
  */
-#define NILFS_MDT_INO_BITS   \
-	((unsigned int)(1 << NILFS_DAT_INO | 1 << NILFS_CPFILE_INO |	\
-			1 << NILFS_SUFILE_INO | 1 << NILFS_IFILE_INO |	\
-			1 << NILFS_ATIME_INO | 1 << NILFS_SKETCH_INO))
+#define NILFS_MDT_INO_BITS						\
+	(BIT(NILFS_DAT_INO) | BIT(NILFS_CPFILE_INO) |			\
+	 BIT(NILFS_SUFILE_INO) | BIT(NILFS_IFILE_INO) |			\
+	 BIT(NILFS_ATIME_INO) | BIT(NILFS_SKETCH_INO))
 
-#define NILFS_SYS_INO_BITS   \
-	((unsigned int)(1 << NILFS_ROOT_INO) | NILFS_MDT_INO_BITS)
+#define NILFS_SYS_INO_BITS (BIT(NILFS_ROOT_INO) | NILFS_MDT_INO_BITS)
 
 #define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino)
 
 #define NILFS_MDT_INODE(sb, ino) \
-	((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & (1 << (ino))))
+	((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino)))
 #define NILFS_VALID_INODE(sb, ino) \
-	((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & (1 << (ino))))
+	((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino)))
 
 /**
  * struct nilfs_transaction_info: context information for synchronization
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index eaccf12c296e..f11a3ad2df0c 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -30,9 +30,9 @@
 #include "mdt.h"
 
 
-#define NILFS_BUFFER_INHERENT_BITS  \
-	((1UL << BH_Uptodate) | (1UL << BH_Mapped) | (1UL << BH_NILFS_Node) | \
-	 (1UL << BH_NILFS_Volatile) | (1UL << BH_NILFS_Checked))
+#define NILFS_BUFFER_INHERENT_BITS					\
+	(BIT(BH_Uptodate) | BIT(BH_Mapped) | BIT(BH_NILFS_Node) |	\
+	 BIT(BH_NILFS_Volatile) | BIT(BH_NILFS_Checked))
 
 static struct buffer_head *
 __nilfs_get_page_block(struct page *page, unsigned long block, pgoff_t index,
@@ -85,9 +85,9 @@ void nilfs_forget_buffer(struct buffer_head *bh)
 {
 	struct page *page = bh->b_page;
 	const unsigned long clear_bits =
-		(1 << BH_Uptodate | 1 << BH_Dirty | 1 << BH_Mapped |
-		 1 << BH_Async_Write | 1 << BH_NILFS_Volatile |
-		 1 << BH_NILFS_Checked | 1 << BH_NILFS_Redirected);
+		(BIT(BH_Uptodate) | BIT(BH_Dirty) | BIT(BH_Mapped) |
+		 BIT(BH_Async_Write) | BIT(BH_NILFS_Volatile) |
+		 BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected));
 
 	lock_buffer(bh);
 	set_mask_bits(&bh->b_state, clear_bits, 0);
@@ -124,17 +124,17 @@ void nilfs_copy_buffer(struct buffer_head *dbh, struct buffer_head *sbh)
 	dbh->b_bdev = sbh->b_bdev;
 
 	bh = dbh;
-	bits = sbh->b_state & ((1UL << BH_Uptodate) | (1UL << BH_Mapped));
+	bits = sbh->b_state & (BIT(BH_Uptodate) | BIT(BH_Mapped));
 	while ((bh = bh->b_this_page) != dbh) {
 		lock_buffer(bh);
 		bits &= bh->b_state;
 		unlock_buffer(bh);
 	}
-	if (bits & (1UL << BH_Uptodate))
+	if (bits & BIT(BH_Uptodate))
 		SetPageUptodate(dpage);
 	else
 		ClearPageUptodate(dpage);
-	if (bits & (1UL << BH_Mapped))
+	if (bits & BIT(BH_Mapped))
 		SetPageMappedToDisk(dpage);
 	else
 		ClearPageMappedToDisk(dpage);
@@ -215,7 +215,7 @@ static void nilfs_copy_page(struct page *dst, struct page *src, int copy_dirty)
 		create_empty_buffers(dst, sbh->b_size, 0);
 
 	if (copy_dirty)
-		mask |= (1UL << BH_Dirty);
+		mask |= BIT(BH_Dirty);
 
 	dbh = dbufs = page_buffers(dst);
 	do {
@@ -414,9 +414,9 @@ void nilfs_clear_dirty_page(struct page *page, bool silent)
 	if (page_has_buffers(page)) {
 		struct buffer_head *bh, *head;
 		const unsigned long clear_bits =
-			(1 << BH_Uptodate | 1 << BH_Dirty | 1 << BH_Mapped |
-			 1 << BH_Async_Write | 1 << BH_NILFS_Volatile |
-			 1 << BH_NILFS_Checked | 1 << BH_NILFS_Redirected);
+			(BIT(BH_Uptodate) | BIT(BH_Dirty) | BIT(BH_Mapped) |
+			 BIT(BH_Async_Write) | BIT(BH_NILFS_Volatile) |
+			 BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected));
 
 		bh = head = page_buffers(page);
 		do {
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 5a97282aa074..bedcae2c28e6 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1858,11 +1858,11 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci)
 		 */
 		list_for_each_entry(bh, &segbuf->sb_payload_buffers,
 				    b_assoc_buffers) {
-			const unsigned long set_bits = (1 << BH_Uptodate);
+			const unsigned long set_bits = BIT(BH_Uptodate);
 			const unsigned long clear_bits =
-				(1 << BH_Dirty | 1 << BH_Async_Write |
-				 1 << BH_Delay | 1 << BH_NILFS_Volatile |
-				 1 << BH_NILFS_Redirected);
+				(BIT(BH_Dirty) | BIT(BH_Async_Write) |
+				 BIT(BH_Delay) | BIT(BH_NILFS_Volatile) |
+				 BIT(BH_NILFS_Redirected));
 
 			set_mask_bits(&bh->b_state, clear_bits, set_bits);
 			if (bh == segbuf->sb_super_root) {
@@ -2132,10 +2132,10 @@ static void nilfs_segctor_start_timer(struct nilfs_sc_info *sci)
 static void nilfs_segctor_do_flush(struct nilfs_sc_info *sci, int bn)
 {
 	spin_lock(&sci->sc_state_lock);
-	if (!(sci->sc_flush_request & (1 << bn))) {
+	if (!(sci->sc_flush_request & BIT(bn))) {
 		unsigned long prev_req = sci->sc_flush_request;
 
-		sci->sc_flush_request |= (1 << bn);
+		sci->sc_flush_request |= BIT(bn);
 		if (!prev_req)
 			wake_up(&sci->sc_wait_daemon);
 	}
@@ -2319,7 +2319,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
 }
 
 #define FLUSH_FILE_BIT	(0x1) /* data file only */
-#define FLUSH_DAT_BIT	(1 << NILFS_DAT_INO) /* DAT only */
+#define FLUSH_DAT_BIT	BIT(NILFS_DAT_INO) /* DAT only */
 
 /**
  * nilfs_segctor_accept - record accepted sequence count of log-write requests
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 5b495c469471..12d11de93602 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -446,7 +446,7 @@ void nilfs_sufile_do_scrap(struct inode *sufile, __u64 segnum,
 
 	kaddr = kmap_atomic(su_bh->b_page);
 	su = nilfs_sufile_block_get_segment_usage(sufile, segnum, su_bh, kaddr);
-	if (su->su_flags == cpu_to_le32(1UL << NILFS_SEGMENT_USAGE_DIRTY) &&
+	if (su->su_flags == cpu_to_le32(BIT(NILFS_SEGMENT_USAGE_DIRTY)) &&
 	    su->su_nblocks == cpu_to_le32(0)) {
 		kunmap_atomic(kaddr);
 		return;
@@ -457,7 +457,7 @@ void nilfs_sufile_do_scrap(struct inode *sufile, __u64 segnum,
 	/* make the segment garbage */
 	su->su_lastmod = cpu_to_le64(0);
 	su->su_nblocks = cpu_to_le32(0);
-	su->su_flags = cpu_to_le32(1UL << NILFS_SEGMENT_USAGE_DIRTY);
+	su->su_flags = cpu_to_le32(BIT(NILFS_SEGMENT_USAGE_DIRTY));
 	kunmap_atomic(kaddr);
 
 	nilfs_sufile_mod_counter(header_bh, clean ? (u64)-1 : 0, dirty ? 0 : 1);
@@ -695,7 +695,7 @@ static int nilfs_sufile_truncate_range(struct inode *sufile,
 		su2 = su;
 		for (j = 0; j < n; j++, su = (void *)su + susz) {
 			if ((le32_to_cpu(su->su_flags) &
-			     ~(1UL << NILFS_SEGMENT_USAGE_ERROR)) ||
+			     ~BIT(NILFS_SEGMENT_USAGE_ERROR)) ||
 			    nilfs_segment_is_active(nilfs, segnum + j)) {
 				ret = -EBUSY;
 				kunmap_atomic(kaddr);
@@ -862,10 +862,10 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile, __u64 segnum, void *buf,
 			si->sui_lastmod = le64_to_cpu(su->su_lastmod);
 			si->sui_nblocks = le32_to_cpu(su->su_nblocks);
 			si->sui_flags = le32_to_cpu(su->su_flags) &
-				~(1UL << NILFS_SEGMENT_USAGE_ACTIVE);
+				~BIT(NILFS_SEGMENT_USAGE_ACTIVE);
 			if (nilfs_segment_is_active(nilfs, segnum + j))
 				si->sui_flags |=
-					(1UL << NILFS_SEGMENT_USAGE_ACTIVE);
+					BIT(NILFS_SEGMENT_USAGE_ACTIVE);
 		}
 		kunmap_atomic(kaddr);
 		brelse(su_bh);
@@ -953,7 +953,7 @@ ssize_t nilfs_sufile_set_suinfo(struct inode *sufile, void *buf,
 			 * disk.
 			 */
 			sup->sup_sui.sui_flags &=
-					~(1UL << NILFS_SEGMENT_USAGE_ACTIVE);
+					~BIT(NILFS_SEGMENT_USAGE_ACTIVE);
 
 			cleansi = nilfs_suinfo_clean(&sup->sup_sui);
 			cleansu = nilfs_segment_usage_clean(su);

From e63e88bc53bac7e4c3f592f8126c51a7569be673 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Date: Tue, 2 Aug 2016 14:05:30 -0700
Subject: [PATCH 070/111] nilfs2: move ioctl interface and disk layout to uapi
 separately

The header file "include/linux/nilfs2_fs.h" is composed of parts for
ioctl and disk format, and both are intended to be shared with user
space programs.

This moves them to the uapi directory "include/uapi/linux" splitting the
file to "nilfs2_api.h" and "nilfs2_ondisk.h".  The following minor
changes are accompanied by this migration:

 - nilfs_direct_node struct in nilfs2/direct.h is converged to
   nilfs2_ondisk.h because it's an on-disk structure.
 - inline functions nilfs_rec_len_from_disk() and
   nilfs_rec_len_to_disk() are moved to nilfs2/dir.c.

Link: http://lkml.kernel.org/r/1465825507-3407-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/filesystems/nilfs2.txt          |   3 +-
 Documentation/ioctl/ioctl-number.txt          |   2 +-
 MAINTAINERS                                   |   3 +-
 fs/nilfs2/bmap.h                              |   2 +-
 fs/nilfs2/btree.h                             |   2 +-
 fs/nilfs2/cpfile.c                            |   1 -
 fs/nilfs2/cpfile.h                            |   3 +-
 fs/nilfs2/dat.h                               |   1 +
 fs/nilfs2/dir.c                               |  22 ++
 fs/nilfs2/direct.h                            |  10 -
 fs/nilfs2/ifile.h                             |   1 -
 fs/nilfs2/ioctl.c                             |   1 -
 fs/nilfs2/nilfs.h                             |   3 +-
 fs/nilfs2/segment.h                           |   1 -
 fs/nilfs2/sufile.c                            |   1 -
 fs/nilfs2/sufile.h                            |   1 -
 include/uapi/linux/nilfs2_api.h               | 292 ++++++++++++++++
 .../linux/nilfs2_ondisk.h}                    | 328 ++----------------
 18 files changed, 348 insertions(+), 329 deletions(-)
 create mode 100644 include/uapi/linux/nilfs2_api.h
 rename include/{linux/nilfs2_fs.h => uapi/linux/nilfs2_ondisk.h} (68%)

diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt
index 5b21ef76f751..c0727dc36271 100644
--- a/Documentation/filesystems/nilfs2.txt
+++ b/Documentation/filesystems/nilfs2.txt
@@ -267,7 +267,8 @@ among NILFS2 files can be depicted as follows:
                                   `-- file (ino=yy)
                                     ( regular file, directory, or symlink )
 
-For detail on the format of each file, please see include/linux/nilfs2_fs.h.
+For detail on the format of each file, please see nilfs2_ondisk.h
+located at include/uapi/linux directory.
 
 There are no patents or other intellectual property that we protect
 with regard to the design of NILFS2.  It is allowed to replicate the
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 56af5e43e9c0..81c7f2bb7daf 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -248,7 +248,7 @@ Code  Seq#(hex)	Include File		Comments
 'm'	00	drivers/scsi/megaraid/megaraid_ioctl.h	conflict!
 'm'	00-1F	net/irda/irmod.h	conflict!
 'n'	00-7F	linux/ncp_fs.h and fs/ncpfs/ioctl.c
-'n'	80-8F	linux/nilfs2_fs.h	NILFS2
+'n'	80-8F	uapi/linux/nilfs2_api.h	NILFS2
 'n'	E0-FF	linux/matroxfb.h	matroxfb
 'o'	00-1F	fs/ocfs2/ocfs2_fs.h	OCFS2
 'o'     00-03   mtd/ubi-user.h		conflict! (OCFS2 and UBI overlaps)
diff --git a/MAINTAINERS b/MAINTAINERS
index bb51bbbc9e1d..e9eacacf0f08 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8258,8 +8258,9 @@ T:	git git://github.com/konis/nilfs2.git
 S:	Supported
 F:	Documentation/filesystems/nilfs2.txt
 F:	fs/nilfs2/
-F:	include/linux/nilfs2_fs.h
 F:	include/trace/events/nilfs2.h
+F:	include/uapi/linux/nilfs2_api.h
+F:	include/uapi/linux/nilfs2_ondisk.h
 
 NINJA SCSI-3 / NINJA SCSI-32Bi (16bit/CardBus) PCMCIA SCSI HOST ADAPTER DRIVER
 M:	YOKOTA Hiroshi <yokota@netlab.is.tsukuba.ac.jp>
diff --git a/fs/nilfs2/bmap.h b/fs/nilfs2/bmap.h
index b6a4c8f93ac8..2b6ffbe5997a 100644
--- a/fs/nilfs2/bmap.h
+++ b/fs/nilfs2/bmap.h
@@ -22,7 +22,7 @@
 #include <linux/types.h>
 #include <linux/fs.h>
 #include <linux/buffer_head.h>
-#include <linux/nilfs2_fs.h>
+#include <linux/nilfs2_ondisk.h>	/* nilfs_binfo, nilfs_inode, etc */
 #include "alloc.h"
 #include "dat.h"
 
diff --git a/fs/nilfs2/btree.h b/fs/nilfs2/btree.h
index df1a25faa83b..2184e47fa4bf 100644
--- a/fs/nilfs2/btree.h
+++ b/fs/nilfs2/btree.h
@@ -22,7 +22,7 @@
 #include <linux/types.h>
 #include <linux/buffer_head.h>
 #include <linux/list.h>
-#include <linux/nilfs2_fs.h>
+#include <linux/nilfs2_ondisk.h>	/* nilfs_btree_node */
 #include "btnode.h"
 #include "bmap.h"
 
diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c
index 19d9f4ae8347..a15a1601e931 100644
--- a/fs/nilfs2/cpfile.c
+++ b/fs/nilfs2/cpfile.c
@@ -21,7 +21,6 @@
 #include <linux/string.h>
 #include <linux/buffer_head.h>
 #include <linux/errno.h>
-#include <linux/nilfs2_fs.h>
 #include "mdt.h"
 #include "cpfile.h"
 
diff --git a/fs/nilfs2/cpfile.h b/fs/nilfs2/cpfile.h
index 0249744ae234..6eca972f9673 100644
--- a/fs/nilfs2/cpfile.h
+++ b/fs/nilfs2/cpfile.h
@@ -21,7 +21,8 @@
 
 #include <linux/fs.h>
 #include <linux/buffer_head.h>
-#include <linux/nilfs2_fs.h>
+#include <linux/nilfs2_api.h>		/* nilfs_cpstat */
+#include <linux/nilfs2_ondisk.h>	/* nilfs_inode, nilfs_checkpoint */
 
 
 int nilfs_cpfile_get_checkpoint(struct inode *, __u64, int,
diff --git a/fs/nilfs2/dat.h b/fs/nilfs2/dat.h
index abbfdabcabea..57dc6cf466d0 100644
--- a/fs/nilfs2/dat.h
+++ b/fs/nilfs2/dat.h
@@ -22,6 +22,7 @@
 #include <linux/types.h>
 #include <linux/buffer_head.h>
 #include <linux/fs.h>
+#include <linux/nilfs2_ondisk.h>	/* nilfs_inode, nilfs_checkpoint */
 
 
 struct nilfs_palloc_req;
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 746956d2937a..908ebbf0ac7e 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -42,6 +42,28 @@
 #include "nilfs.h"
 #include "page.h"
 
+static inline unsigned int nilfs_rec_len_from_disk(__le16 dlen)
+{
+	unsigned int len = le16_to_cpu(dlen);
+
+#if (PAGE_SIZE >= 65536)
+	if (len == NILFS_MAX_REC_LEN)
+		return 1 << 16;
+#endif
+	return len;
+}
+
+static inline __le16 nilfs_rec_len_to_disk(unsigned int len)
+{
+#if (PAGE_SIZE >= 65536)
+	if (len == (1 << 16))
+		return cpu_to_le16(NILFS_MAX_REC_LEN);
+
+	BUG_ON(len > (1 << 16));
+#endif
+	return cpu_to_le16(len);
+}
+
 /*
  * nilfs uses block-sized chunks. Arguably, sector-sized ones would be
  * more robust, but we have what we have
diff --git a/fs/nilfs2/direct.h b/fs/nilfs2/direct.h
index 3015a6e78724..cfe85e848bba 100644
--- a/fs/nilfs2/direct.h
+++ b/fs/nilfs2/direct.h
@@ -24,16 +24,6 @@
 #include "bmap.h"
 
 
-/**
- * struct nilfs_direct_node - direct node
- * @dn_flags: flags
- * @dn_pad: padding
- */
-struct nilfs_direct_node {
-	__u8 dn_flags;
-	__u8 pad[7];
-};
-
 #define NILFS_DIRECT_NBLOCKS	(NILFS_BMAP_SIZE / sizeof(__le64) - 1)
 #define NILFS_DIRECT_KEY_MIN	0
 #define NILFS_DIRECT_KEY_MAX	(NILFS_DIRECT_NBLOCKS - 1)
diff --git a/fs/nilfs2/ifile.h b/fs/nilfs2/ifile.h
index 23ad2f091e76..188b94fe0ec5 100644
--- a/fs/nilfs2/ifile.h
+++ b/fs/nilfs2/ifile.h
@@ -23,7 +23,6 @@
 
 #include <linux/fs.h>
 #include <linux/buffer_head.h>
-#include <linux/nilfs2_fs.h>
 #include "mdt.h"
 #include "alloc.h"
 
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 827283fe9525..f1d7989459fd 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -25,7 +25,6 @@
 #include <linux/compat.h>	/* compat_ptr() */
 #include <linux/mount.h>	/* mnt_want_write_file(), mnt_drop_write_file() */
 #include <linux/buffer_head.h>
-#include <linux/nilfs2_fs.h>
 #include "nilfs.h"
 #include "segment.h"
 #include "bmap.h"
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 2ba8a146af1f..33f8c8fc96e8 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -23,7 +23,8 @@
 #include <linux/buffer_head.h>
 #include <linux/spinlock.h>
 #include <linux/blkdev.h>
-#include <linux/nilfs2_fs.h>
+#include <linux/nilfs2_api.h>
+#include <linux/nilfs2_ondisk.h>
 #include "the_nilfs.h"
 #include "bmap.h"
 
diff --git a/fs/nilfs2/segment.h b/fs/nilfs2/segment.h
index 6565c10b7b76..1060949d7dd2 100644
--- a/fs/nilfs2/segment.h
+++ b/fs/nilfs2/segment.h
@@ -23,7 +23,6 @@
 #include <linux/fs.h>
 #include <linux/buffer_head.h>
 #include <linux/workqueue.h>
-#include <linux/nilfs2_fs.h>
 #include "nilfs.h"
 
 struct nilfs_root;
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 12d11de93602..1541a1e9221a 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -22,7 +22,6 @@
 #include <linux/string.h>
 #include <linux/buffer_head.h>
 #include <linux/errno.h>
-#include <linux/nilfs2_fs.h>
 #include "mdt.h"
 #include "sufile.h"
 
diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
index 46e89872294c..158a9190c8ec 100644
--- a/fs/nilfs2/sufile.h
+++ b/fs/nilfs2/sufile.h
@@ -21,7 +21,6 @@
 
 #include <linux/fs.h>
 #include <linux/buffer_head.h>
-#include <linux/nilfs2_fs.h>
 #include "mdt.h"
 
 
diff --git a/include/uapi/linux/nilfs2_api.h b/include/uapi/linux/nilfs2_api.h
new file mode 100644
index 000000000000..ef4c1de89b11
--- /dev/null
+++ b/include/uapi/linux/nilfs2_api.h
@@ -0,0 +1,292 @@
+/*
+ * nilfs2_api.h - NILFS2 user space API
+ *
+ * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; either version 2.1 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _LINUX_NILFS2_API_H
+#define _LINUX_NILFS2_API_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/**
+ * struct nilfs_cpinfo - checkpoint information
+ * @ci_flags: flags
+ * @ci_pad: padding
+ * @ci_cno: checkpoint number
+ * @ci_create: creation timestamp
+ * @ci_nblk_inc: number of blocks incremented by this checkpoint
+ * @ci_inodes_count: inodes count
+ * @ci_blocks_count: blocks count
+ * @ci_next: next checkpoint number in snapshot list
+ */
+struct nilfs_cpinfo {
+	__u32 ci_flags;
+	__u32 ci_pad;
+	__u64 ci_cno;
+	__u64 ci_create;
+	__u64 ci_nblk_inc;
+	__u64 ci_inodes_count;
+	__u64 ci_blocks_count;
+	__u64 ci_next;
+};
+
+/* checkpoint flags */
+enum {
+	NILFS_CPINFO_SNAPSHOT,
+	NILFS_CPINFO_INVALID,
+	NILFS_CPINFO_SKETCH,
+	NILFS_CPINFO_MINOR,
+};
+
+#define NILFS_CPINFO_FNS(flag, name)					\
+static inline int							\
+nilfs_cpinfo_##name(const struct nilfs_cpinfo *cpinfo)			\
+{									\
+	return !!(cpinfo->ci_flags & (1UL << NILFS_CPINFO_##flag));	\
+}
+
+NILFS_CPINFO_FNS(SNAPSHOT, snapshot)
+NILFS_CPINFO_FNS(INVALID, invalid)
+NILFS_CPINFO_FNS(MINOR, minor)
+
+/**
+ * nilfs_suinfo - segment usage information
+ * @sui_lastmod: timestamp of last modification
+ * @sui_nblocks: number of written blocks in segment
+ * @sui_flags: segment usage flags
+ */
+struct nilfs_suinfo {
+	__u64 sui_lastmod;
+	__u32 sui_nblocks;
+	__u32 sui_flags;
+};
+
+/* segment usage flags */
+enum {
+	NILFS_SUINFO_ACTIVE,
+	NILFS_SUINFO_DIRTY,
+	NILFS_SUINFO_ERROR,
+};
+
+#define NILFS_SUINFO_FNS(flag, name)					\
+static inline int							\
+nilfs_suinfo_##name(const struct nilfs_suinfo *si)			\
+{									\
+	return si->sui_flags & (1UL << NILFS_SUINFO_##flag);		\
+}
+
+NILFS_SUINFO_FNS(ACTIVE, active)
+NILFS_SUINFO_FNS(DIRTY, dirty)
+NILFS_SUINFO_FNS(ERROR, error)
+
+static inline int nilfs_suinfo_clean(const struct nilfs_suinfo *si)
+{
+	return !si->sui_flags;
+}
+
+/**
+ * nilfs_suinfo_update - segment usage information update
+ * @sup_segnum: segment number
+ * @sup_flags: flags for which fields are active in sup_sui
+ * @sup_reserved: reserved necessary for alignment
+ * @sup_sui: segment usage information
+ */
+struct nilfs_suinfo_update {
+	__u64 sup_segnum;
+	__u32 sup_flags;
+	__u32 sup_reserved;
+	struct nilfs_suinfo sup_sui;
+};
+
+enum {
+	NILFS_SUINFO_UPDATE_LASTMOD,
+	NILFS_SUINFO_UPDATE_NBLOCKS,
+	NILFS_SUINFO_UPDATE_FLAGS,
+	__NR_NILFS_SUINFO_UPDATE_FIELDS,
+};
+
+#define NILFS_SUINFO_UPDATE_FNS(flag, name)				\
+static inline void							\
+nilfs_suinfo_update_set_##name(struct nilfs_suinfo_update *sup)		\
+{									\
+	sup->sup_flags |= 1UL << NILFS_SUINFO_UPDATE_##flag;		\
+}									\
+static inline void							\
+nilfs_suinfo_update_clear_##name(struct nilfs_suinfo_update *sup)	\
+{									\
+	sup->sup_flags &= ~(1UL << NILFS_SUINFO_UPDATE_##flag);		\
+}									\
+static inline int							\
+nilfs_suinfo_update_##name(const struct nilfs_suinfo_update *sup)	\
+{									\
+	return !!(sup->sup_flags & (1UL << NILFS_SUINFO_UPDATE_##flag));\
+}
+
+NILFS_SUINFO_UPDATE_FNS(LASTMOD, lastmod)
+NILFS_SUINFO_UPDATE_FNS(NBLOCKS, nblocks)
+NILFS_SUINFO_UPDATE_FNS(FLAGS, flags)
+
+enum {
+	NILFS_CHECKPOINT,
+	NILFS_SNAPSHOT,
+};
+
+/**
+ * struct nilfs_cpmode - change checkpoint mode structure
+ * @cm_cno: checkpoint number
+ * @cm_mode: mode of checkpoint
+ * @cm_pad: padding
+ */
+struct nilfs_cpmode {
+	__u64 cm_cno;
+	__u32 cm_mode;
+	__u32 cm_pad;
+};
+
+/**
+ * struct nilfs_argv - argument vector
+ * @v_base: pointer on data array from userspace
+ * @v_nmembs: number of members in data array
+ * @v_size: size of data array in bytes
+ * @v_flags: flags
+ * @v_index: start number of target data items
+ */
+struct nilfs_argv {
+	__u64 v_base;
+	__u32 v_nmembs;	/* number of members */
+	__u16 v_size;	/* size of members */
+	__u16 v_flags;
+	__u64 v_index;
+};
+
+/**
+ * struct nilfs_period - period of checkpoint numbers
+ * @p_start: start checkpoint number (inclusive)
+ * @p_end: end checkpoint number (exclusive)
+ */
+struct nilfs_period {
+	__u64 p_start;
+	__u64 p_end;
+};
+
+/**
+ * struct nilfs_cpstat - checkpoint statistics
+ * @cs_cno: checkpoint number
+ * @cs_ncps: number of checkpoints
+ * @cs_nsss: number of snapshots
+ */
+struct nilfs_cpstat {
+	__u64 cs_cno;
+	__u64 cs_ncps;
+	__u64 cs_nsss;
+};
+
+/**
+ * struct nilfs_sustat - segment usage statistics
+ * @ss_nsegs: number of segments
+ * @ss_ncleansegs: number of clean segments
+ * @ss_ndirtysegs: number of dirty segments
+ * @ss_ctime: creation time of the last segment
+ * @ss_nongc_ctime: creation time of the last segment not for GC
+ * @ss_prot_seq: least sequence number of segments which must not be reclaimed
+ */
+struct nilfs_sustat {
+	__u64 ss_nsegs;
+	__u64 ss_ncleansegs;
+	__u64 ss_ndirtysegs;
+	__u64 ss_ctime;
+	__u64 ss_nongc_ctime;
+	__u64 ss_prot_seq;
+};
+
+/**
+ * struct nilfs_vinfo - virtual block number information
+ * @vi_vblocknr: virtual block number
+ * @vi_start: start checkpoint number (inclusive)
+ * @vi_end: end checkpoint number (exclusive)
+ * @vi_blocknr: disk block number
+ */
+struct nilfs_vinfo {
+	__u64 vi_vblocknr;
+	__u64 vi_start;
+	__u64 vi_end;
+	__u64 vi_blocknr;
+};
+
+/**
+ * struct nilfs_vdesc - descriptor of virtual block number
+ * @vd_ino: inode number
+ * @vd_cno: checkpoint number
+ * @vd_vblocknr: virtual block number
+ * @vd_period: period of checkpoint numbers
+ * @vd_blocknr: disk block number
+ * @vd_offset: logical block offset inside a file
+ * @vd_flags: flags (data or node block)
+ * @vd_pad: padding
+ */
+struct nilfs_vdesc {
+	__u64 vd_ino;
+	__u64 vd_cno;
+	__u64 vd_vblocknr;
+	struct nilfs_period vd_period;
+	__u64 vd_blocknr;
+	__u64 vd_offset;
+	__u32 vd_flags;
+	__u32 vd_pad;
+};
+
+/**
+ * struct nilfs_bdesc - descriptor of disk block number
+ * @bd_ino: inode number
+ * @bd_oblocknr: disk block address (for skipping dead blocks)
+ * @bd_blocknr: disk block address
+ * @bd_offset: logical block offset inside a file
+ * @bd_level: level in the b-tree organization
+ * @bd_pad: padding
+ */
+struct nilfs_bdesc {
+	__u64 bd_ino;
+	__u64 bd_oblocknr;
+	__u64 bd_blocknr;
+	__u64 bd_offset;
+	__u32 bd_level;
+	__u32 bd_pad;
+};
+
+#define NILFS_IOCTL_IDENT	'n'
+
+#define NILFS_IOCTL_CHANGE_CPMODE					\
+	_IOW(NILFS_IOCTL_IDENT, 0x80, struct nilfs_cpmode)
+#define NILFS_IOCTL_DELETE_CHECKPOINT					\
+	_IOW(NILFS_IOCTL_IDENT, 0x81, __u64)
+#define NILFS_IOCTL_GET_CPINFO						\
+	_IOR(NILFS_IOCTL_IDENT, 0x82, struct nilfs_argv)
+#define NILFS_IOCTL_GET_CPSTAT						\
+	_IOR(NILFS_IOCTL_IDENT, 0x83, struct nilfs_cpstat)
+#define NILFS_IOCTL_GET_SUINFO						\
+	_IOR(NILFS_IOCTL_IDENT, 0x84, struct nilfs_argv)
+#define NILFS_IOCTL_GET_SUSTAT						\
+	_IOR(NILFS_IOCTL_IDENT, 0x85, struct nilfs_sustat)
+#define NILFS_IOCTL_GET_VINFO						\
+	_IOWR(NILFS_IOCTL_IDENT, 0x86, struct nilfs_argv)
+#define NILFS_IOCTL_GET_BDESCS						\
+	_IOWR(NILFS_IOCTL_IDENT, 0x87, struct nilfs_argv)
+#define NILFS_IOCTL_CLEAN_SEGMENTS					\
+	_IOW(NILFS_IOCTL_IDENT, 0x88, struct nilfs_argv[5])
+#define NILFS_IOCTL_SYNC						\
+	_IOR(NILFS_IOCTL_IDENT, 0x8A, __u64)
+#define NILFS_IOCTL_RESIZE						\
+	_IOW(NILFS_IOCTL_IDENT, 0x8B, __u64)
+#define NILFS_IOCTL_SET_ALLOC_RANGE					\
+	_IOW(NILFS_IOCTL_IDENT, 0x8C, __u64[2])
+#define NILFS_IOCTL_SET_SUINFO						\
+	_IOW(NILFS_IOCTL_IDENT, 0x8D, struct nilfs_argv)
+
+#endif /* _LINUX_NILFS2_API_H */
diff --git a/include/linux/nilfs2_fs.h b/include/uapi/linux/nilfs2_ondisk.h
similarity index 68%
rename from include/linux/nilfs2_fs.h
rename to include/uapi/linux/nilfs2_ondisk.h
index 5988dd57ba66..2a8a3addb675 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/uapi/linux/nilfs2_ondisk.h
@@ -1,5 +1,5 @@
 /*
- * nilfs2_fs.h - NILFS2 on-disk structures and common declarations.
+ * nilfs2_ondisk.h - NILFS2 on-disk structures
  *
  * Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
  *
@@ -7,13 +7,6 @@
  * it under the terms of the GNU Lesser General Public License as published
  * by the Free Software Foundation; either version 2.1 of the License, or
  * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU Lesser General Public License for more details.
- *
- * Written by Koji Sato and Ryusuke Konishi.
  */
 /*
  *  linux/include/linux/ext2_fs.h
@@ -30,16 +23,15 @@
  *  Copyright (C) 1991, 1992  Linus Torvalds
  */
 
-#ifndef _LINUX_NILFS_FS_H
-#define _LINUX_NILFS_FS_H
+#ifndef _LINUX_NILFS2_ONDISK_H
+#define _LINUX_NILFS2_ONDISK_H
 
 #include <linux/types.h>
-#include <linux/ioctl.h>
 #include <linux/magic.h>
-#include <linux/bug.h>
 
 
 #define NILFS_INODE_BMAP_SIZE	7
+
 /**
  * struct nilfs_inode - structure of an inode on disk
  * @i_blocks: blocks count
@@ -56,7 +48,7 @@
  * @i_bmap: block mapping
  * @i_xattr: extended attributes
  * @i_generation: file generation (for NFS)
- * @i_pad:	padding
+ * @i_pad: padding
  */
 struct nilfs_inode {
 	__le64	i_blocks;
@@ -338,29 +330,7 @@ enum {
 #define NILFS_DIR_ROUND			(NILFS_DIR_PAD - 1)
 #define NILFS_DIR_REC_LEN(name_len)	(((name_len) + 12 + NILFS_DIR_ROUND) & \
 					~NILFS_DIR_ROUND)
-#define NILFS_MAX_REC_LEN		((1<<16)-1)
-
-static inline unsigned int nilfs_rec_len_from_disk(__le16 dlen)
-{
-	unsigned int len = le16_to_cpu(dlen);
-
-#if !defined(__KERNEL__) || (PAGE_SIZE >= 65536)
-	if (len == NILFS_MAX_REC_LEN)
-		return 1 << 16;
-#endif
-	return len;
-}
-
-static inline __le16 nilfs_rec_len_to_disk(unsigned int len)
-{
-#if !defined(__KERNEL__) || (PAGE_SIZE >= 65536)
-	if (len == (1 << 16))
-		return cpu_to_le16(NILFS_MAX_REC_LEN);
-	else if (len > (1 << 16))
-		BUG();
-#endif
-	return cpu_to_le16(len);
-}
+#define NILFS_MAX_REC_LEN		((1 << 16) - 1)
 
 /**
  * struct nilfs_finfo - file information
@@ -374,11 +344,10 @@ struct nilfs_finfo {
 	__le64 fi_cno;
 	__le32 fi_nblocks;
 	__le32 fi_ndatablk;
-	/* array of virtual block numbers */
 };
 
 /**
- * struct nilfs_binfo_v - information for the block to which a virtual block number is assigned
+ * struct nilfs_binfo_v - information on a data block (except DAT)
  * @bi_vblocknr: virtual block number
  * @bi_blkoff: block offset
  */
@@ -388,7 +357,7 @@ struct nilfs_binfo_v {
 };
 
 /**
- * struct nilfs_binfo_dat - information for the block which belongs to the DAT file
+ * struct nilfs_binfo_dat - information on a DAT node block
  * @bi_blkoff: block offset
  * @bi_level: level
  * @bi_pad: padding
@@ -454,7 +423,7 @@ struct nilfs_segment_summary {
 #define NILFS_SS_GC     0x0010  /* segment written for cleaner operation */
 
 /**
- * struct nilfs_btree_node - B-tree node
+ * struct nilfs_btree_node - header of B-tree node block
  * @bn_flags: flags
  * @bn_level: level
  * @bn_nchildren: number of children
@@ -475,6 +444,16 @@ struct nilfs_btree_node {
 #define NILFS_BTREE_LEVEL_NODE_MIN      (NILFS_BTREE_LEVEL_DATA + 1)
 #define NILFS_BTREE_LEVEL_MAX           14	/* Max level (exclusive) */
 
+/**
+ * struct nilfs_direct_node - header of built-in bmap array
+ * @dn_flags: flags
+ * @dn_pad: padding
+ */
+struct nilfs_direct_node {
+	__u8 dn_flags;
+	__u8 pad[7];
+};
+
 /**
  * struct nilfs_palloc_group_desc - block group descriptor
  * @pg_nfrees: number of free entries in block group
@@ -573,40 +552,6 @@ NILFS_CHECKPOINT_FNS(SNAPSHOT, snapshot)
 NILFS_CHECKPOINT_FNS(INVALID, invalid)
 NILFS_CHECKPOINT_FNS(MINOR, minor)
 
-/**
- * struct nilfs_cpinfo - checkpoint information
- * @ci_flags: flags
- * @ci_pad: padding
- * @ci_cno: checkpoint number
- * @ci_create: creation timestamp
- * @ci_nblk_inc: number of blocks incremented by this checkpoint
- * @ci_inodes_count: inodes count
- * @ci_blocks_count: blocks count
- * @ci_next: next checkpoint number in snapshot list
- */
-struct nilfs_cpinfo {
-	__u32 ci_flags;
-	__u32 ci_pad;
-	__u64 ci_cno;
-	__u64 ci_create;
-	__u64 ci_nblk_inc;
-	__u64 ci_inodes_count;
-	__u64 ci_blocks_count;
-	__u64 ci_next;
-};
-
-#define NILFS_CPINFO_FNS(flag, name)					\
-static inline int							\
-nilfs_cpinfo_##name(const struct nilfs_cpinfo *cpinfo)			\
-{									\
-	return !!(cpinfo->ci_flags & (1UL << NILFS_CHECKPOINT_##flag));	\
-}
-
-NILFS_CPINFO_FNS(SNAPSHOT, snapshot)
-NILFS_CPINFO_FNS(INVALID, invalid)
-NILFS_CPINFO_FNS(MINOR, minor)
-
-
 /**
  * struct nilfs_cpfile_header - checkpoint file header
  * @ch_ncheckpoints: number of checkpoints
@@ -619,7 +564,7 @@ struct nilfs_cpfile_header {
 	struct nilfs_snapshot_list ch_snapshot_list;
 };
 
-#define NILFS_CPFILE_FIRST_CHECKPOINT_OFFSET	\
+#define NILFS_CPFILE_FIRST_CHECKPOINT_OFFSET				\
 	((sizeof(struct nilfs_cpfile_header) +				\
 	  sizeof(struct nilfs_checkpoint) - 1) /			\
 			sizeof(struct nilfs_checkpoint))
@@ -643,8 +588,6 @@ enum {
 	NILFS_SEGMENT_USAGE_ACTIVE,
 	NILFS_SEGMENT_USAGE_DIRTY,
 	NILFS_SEGMENT_USAGE_ERROR,
-
-	/* ... */
 };
 
 #define NILFS_SEGMENT_USAGE_FNS(flag, name)				\
@@ -699,236 +642,9 @@ struct nilfs_sufile_header {
 	/* ... */
 };
 
-#define NILFS_SUFILE_FIRST_SEGMENT_USAGE_OFFSET	\
+#define NILFS_SUFILE_FIRST_SEGMENT_USAGE_OFFSET				\
 	((sizeof(struct nilfs_sufile_header) +				\
 	  sizeof(struct nilfs_segment_usage) - 1) /			\
 			 sizeof(struct nilfs_segment_usage))
 
-/**
- * nilfs_suinfo - segment usage information
- * @sui_lastmod: timestamp of last modification
- * @sui_nblocks: number of written blocks in segment
- * @sui_flags: segment usage flags
- */
-struct nilfs_suinfo {
-	__u64 sui_lastmod;
-	__u32 sui_nblocks;
-	__u32 sui_flags;
-};
-
-#define NILFS_SUINFO_FNS(flag, name)					\
-static inline int							\
-nilfs_suinfo_##name(const struct nilfs_suinfo *si)			\
-{									\
-	return si->sui_flags & (1UL << NILFS_SEGMENT_USAGE_##flag);	\
-}
-
-NILFS_SUINFO_FNS(ACTIVE, active)
-NILFS_SUINFO_FNS(DIRTY, dirty)
-NILFS_SUINFO_FNS(ERROR, error)
-
-static inline int nilfs_suinfo_clean(const struct nilfs_suinfo *si)
-{
-	return !si->sui_flags;
-}
-
-/* ioctl */
-/**
- * nilfs_suinfo_update - segment usage information update
- * @sup_segnum: segment number
- * @sup_flags: flags for which fields are active in sup_sui
- * @sup_reserved: reserved necessary for alignment
- * @sup_sui: segment usage information
- */
-struct nilfs_suinfo_update {
-	__u64 sup_segnum;
-	__u32 sup_flags;
-	__u32 sup_reserved;
-	struct nilfs_suinfo sup_sui;
-};
-
-enum {
-	NILFS_SUINFO_UPDATE_LASTMOD,
-	NILFS_SUINFO_UPDATE_NBLOCKS,
-	NILFS_SUINFO_UPDATE_FLAGS,
-	__NR_NILFS_SUINFO_UPDATE_FIELDS,
-};
-
-#define NILFS_SUINFO_UPDATE_FNS(flag, name)				\
-static inline void							\
-nilfs_suinfo_update_set_##name(struct nilfs_suinfo_update *sup)		\
-{									\
-	sup->sup_flags |= 1UL << NILFS_SUINFO_UPDATE_##flag;		\
-}									\
-static inline void							\
-nilfs_suinfo_update_clear_##name(struct nilfs_suinfo_update *sup)	\
-{									\
-	sup->sup_flags &= ~(1UL << NILFS_SUINFO_UPDATE_##flag);		\
-}									\
-static inline int							\
-nilfs_suinfo_update_##name(const struct nilfs_suinfo_update *sup)	\
-{									\
-	return !!(sup->sup_flags & (1UL << NILFS_SUINFO_UPDATE_##flag));\
-}
-
-NILFS_SUINFO_UPDATE_FNS(LASTMOD, lastmod)
-NILFS_SUINFO_UPDATE_FNS(NBLOCKS, nblocks)
-NILFS_SUINFO_UPDATE_FNS(FLAGS, flags)
-
-enum {
-	NILFS_CHECKPOINT,
-	NILFS_SNAPSHOT,
-};
-
-/**
- * struct nilfs_cpmode - change checkpoint mode structure
- * @cm_cno: checkpoint number
- * @cm_mode: mode of checkpoint
- * @cm_pad: padding
- */
-struct nilfs_cpmode {
-	__u64 cm_cno;
-	__u32 cm_mode;
-	__u32 cm_pad;
-};
-
-/**
- * struct nilfs_argv - argument vector
- * @v_base: pointer on data array from userspace
- * @v_nmembs: number of members in data array
- * @v_size: size of data array in bytes
- * @v_flags: flags
- * @v_index: start number of target data items
- */
-struct nilfs_argv {
-	__u64 v_base;
-	__u32 v_nmembs;	/* number of members */
-	__u16 v_size;	/* size of members */
-	__u16 v_flags;
-	__u64 v_index;
-};
-
-/**
- * struct nilfs_period - period of checkpoint numbers
- * @p_start: start checkpoint number (inclusive)
- * @p_end: end checkpoint number (exclusive)
- */
-struct nilfs_period {
-	__u64 p_start;
-	__u64 p_end;
-};
-
-/**
- * struct nilfs_cpstat - checkpoint statistics
- * @cs_cno: checkpoint number
- * @cs_ncps: number of checkpoints
- * @cs_nsss: number of snapshots
- */
-struct nilfs_cpstat {
-	__u64 cs_cno;
-	__u64 cs_ncps;
-	__u64 cs_nsss;
-};
-
-/**
- * struct nilfs_sustat - segment usage statistics
- * @ss_nsegs: number of segments
- * @ss_ncleansegs: number of clean segments
- * @ss_ndirtysegs: number of dirty segments
- * @ss_ctime: creation time of the last segment
- * @ss_nongc_ctime: creation time of the last segment not for GC
- * @ss_prot_seq: least sequence number of segments which must not be reclaimed
- */
-struct nilfs_sustat {
-	__u64 ss_nsegs;
-	__u64 ss_ncleansegs;
-	__u64 ss_ndirtysegs;
-	__u64 ss_ctime;
-	__u64 ss_nongc_ctime;
-	__u64 ss_prot_seq;
-};
-
-/**
- * struct nilfs_vinfo - virtual block number information
- * @vi_vblocknr: virtual block number
- * @vi_start: start checkpoint number (inclusive)
- * @vi_end: end checkpoint number (exclusive)
- * @vi_blocknr: disk block number
- */
-struct nilfs_vinfo {
-	__u64 vi_vblocknr;
-	__u64 vi_start;
-	__u64 vi_end;
-	__u64 vi_blocknr;
-};
-
-/**
- * struct nilfs_vdesc - descriptor of virtual block number
- * @vd_ino: inode number
- * @vd_cno: checkpoint number
- * @vd_vblocknr: virtual block number
- * @vd_period: period of checkpoint numbers
- * @vd_blocknr: disk block number
- * @vd_offset: logical block offset inside a file
- * @vd_flags: flags (data or node block)
- * @vd_pad: padding
- */
-struct nilfs_vdesc {
-	__u64 vd_ino;
-	__u64 vd_cno;
-	__u64 vd_vblocknr;
-	struct nilfs_period vd_period;
-	__u64 vd_blocknr;
-	__u64 vd_offset;
-	__u32 vd_flags;
-	__u32 vd_pad;
-};
-
-/**
- * struct nilfs_bdesc - descriptor of disk block number
- * @bd_ino: inode number
- * @bd_oblocknr: disk block address (for skipping dead blocks)
- * @bd_blocknr: disk block address
- * @bd_offset: logical block offset inside a file
- * @bd_level: level in the b-tree organization
- * @bd_pad: padding
- */
-struct nilfs_bdesc {
-	__u64 bd_ino;
-	__u64 bd_oblocknr;
-	__u64 bd_blocknr;
-	__u64 bd_offset;
-	__u32 bd_level;
-	__u32 bd_pad;
-};
-
-#define NILFS_IOCTL_IDENT		'n'
-
-#define NILFS_IOCTL_CHANGE_CPMODE  \
-	_IOW(NILFS_IOCTL_IDENT, 0x80, struct nilfs_cpmode)
-#define NILFS_IOCTL_DELETE_CHECKPOINT  \
-	_IOW(NILFS_IOCTL_IDENT, 0x81, __u64)
-#define NILFS_IOCTL_GET_CPINFO  \
-	_IOR(NILFS_IOCTL_IDENT, 0x82, struct nilfs_argv)
-#define NILFS_IOCTL_GET_CPSTAT  \
-	_IOR(NILFS_IOCTL_IDENT, 0x83, struct nilfs_cpstat)
-#define NILFS_IOCTL_GET_SUINFO  \
-	_IOR(NILFS_IOCTL_IDENT, 0x84, struct nilfs_argv)
-#define NILFS_IOCTL_GET_SUSTAT  \
-	_IOR(NILFS_IOCTL_IDENT, 0x85, struct nilfs_sustat)
-#define NILFS_IOCTL_GET_VINFO  \
-	_IOWR(NILFS_IOCTL_IDENT, 0x86, struct nilfs_argv)
-#define NILFS_IOCTL_GET_BDESCS  \
-	_IOWR(NILFS_IOCTL_IDENT, 0x87, struct nilfs_argv)
-#define NILFS_IOCTL_CLEAN_SEGMENTS  \
-	_IOW(NILFS_IOCTL_IDENT, 0x88, struct nilfs_argv[5])
-#define NILFS_IOCTL_SYNC  \
-	_IOR(NILFS_IOCTL_IDENT, 0x8A, __u64)
-#define NILFS_IOCTL_RESIZE  \
-	_IOW(NILFS_IOCTL_IDENT, 0x8B, __u64)
-#define NILFS_IOCTL_SET_ALLOC_RANGE  \
-	_IOW(NILFS_IOCTL_IDENT, 0x8C, __u64[2])
-#define NILFS_IOCTL_SET_SUINFO  \
-	_IOW(NILFS_IOCTL_IDENT, 0x8D, struct nilfs_argv)
-
-#endif	/* _LINUX_NILFS_FS_H */
+#endif	/* _LINUX_NILFS2_ONDISK_H */

From 0a11b9aae49adf1f952427ef1a1d9e793dd6ffb6 Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm@suse.com>
Date: Tue, 2 Aug 2016 14:05:33 -0700
Subject: [PATCH 071/111] reiserfs: fix "new_insert_key may be used
 uninitialized ..."

new_insert_key only makes any sense when it's associated with a
new_insert_ptr, which is initialized to NULL and changed to a
buffer_head when we also initialize new_insert_key.  We can key off of
that to avoid the uninitialized warning.

Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/reiserfs/ibalance.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/reiserfs/ibalance.c b/fs/reiserfs/ibalance.c
index b751eea32e20..5db6f45b3fed 100644
--- a/fs/reiserfs/ibalance.c
+++ b/fs/reiserfs/ibalance.c
@@ -1153,8 +1153,9 @@ int balance_internal(struct tree_balance *tb,
 				       insert_ptr);
 	}
 
-	memcpy(new_insert_key_addr, &new_insert_key, KEY_SIZE);
 	insert_ptr[0] = new_insert_ptr;
+	if (new_insert_ptr)
+		memcpy(new_insert_key_addr, &new_insert_key, KEY_SIZE);
 
 	return order;
 }

From 7e7814180b334dff97ef8f56c7c40c277ad4531c Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto@kernel.org>
Date: Tue, 2 Aug 2016 14:05:36 -0700
Subject: [PATCH 072/111] signal: consolidate {TS,TLF}_RESTORE_SIGMASK code

In general, there's no need for the "restore sigmask" flag to live in
ti->flags.  alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.

Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.

Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.

Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/alpha/include/asm/thread_info.h      | 27 ----------
 arch/ia64/include/asm/thread_info.h       | 28 ----------
 arch/microblaze/include/asm/thread_info.h | 27 ----------
 arch/powerpc/include/asm/thread_info.h    | 25 ---------
 arch/sh/include/asm/thread_info.h         | 26 ----------
 arch/sparc/include/asm/thread_info_64.h   | 24 ---------
 arch/tile/include/asm/thread_info.h       | 27 ----------
 arch/x86/include/asm/thread_info.h        | 24 ---------
 include/linux/sched.h                     | 63 +++++++++++++++++++++++
 include/linux/thread_info.h               | 41 ---------------
 10 files changed, 63 insertions(+), 249 deletions(-)

diff --git a/arch/alpha/include/asm/thread_info.h b/arch/alpha/include/asm/thread_info.h
index 32e920a83ae5..e9e90bfa2b50 100644
--- a/arch/alpha/include/asm/thread_info.h
+++ b/arch/alpha/include/asm/thread_info.h
@@ -86,33 +86,6 @@ register struct thread_info *__current_thread_info __asm__("$8");
 #define TS_UAC_NOPRINT		0x0001	/* ! Preserve the following three */
 #define TS_UAC_NOFIX		0x0002	/* ! flags as they match          */
 #define TS_UAC_SIGBUS		0x0004	/* ! userspace part of 'osf_sysinfo' */
-#define TS_RESTORE_SIGMASK	0x0008	/* restore signal mask in do_signal() */
-
-#ifndef __ASSEMBLY__
-#define HAVE_SET_RESTORE_SIGMASK	1
-static inline void set_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	ti->status |= TS_RESTORE_SIGMASK;
-	WARN_ON(!test_bit(TIF_SIGPENDING, (unsigned long *)&ti->flags));
-}
-static inline void clear_restore_sigmask(void)
-{
-	current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
-}
-static inline bool test_restore_sigmask(void)
-{
-	return current_thread_info()->status & TS_RESTORE_SIGMASK;
-}
-static inline bool test_and_clear_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	if (!(ti->status & TS_RESTORE_SIGMASK))
-		return false;
-	ti->status &= ~TS_RESTORE_SIGMASK;
-	return true;
-}
-#endif
 
 #define SET_UNALIGN_CTL(task,value)	({				\
 	__u32 status = task_thread_info(task)->status & ~UAC_BITMASK;	\
diff --git a/arch/ia64/include/asm/thread_info.h b/arch/ia64/include/asm/thread_info.h
index d1212b84fb83..29bd59790d6c 100644
--- a/arch/ia64/include/asm/thread_info.h
+++ b/arch/ia64/include/asm/thread_info.h
@@ -121,32 +121,4 @@ struct thread_info {
 /* like TIF_ALLWORK_BITS but sans TIF_SYSCALL_TRACE or TIF_SYSCALL_AUDIT */
 #define TIF_WORK_MASK		(TIF_ALLWORK_MASK&~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT))
 
-#define TS_RESTORE_SIGMASK	2	/* restore signal mask in do_signal() */
-
-#ifndef __ASSEMBLY__
-#define HAVE_SET_RESTORE_SIGMASK	1
-static inline void set_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	ti->status |= TS_RESTORE_SIGMASK;
-	WARN_ON(!test_bit(TIF_SIGPENDING, &ti->flags));
-}
-static inline void clear_restore_sigmask(void)
-{
-	current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
-}
-static inline bool test_restore_sigmask(void)
-{
-	return current_thread_info()->status & TS_RESTORE_SIGMASK;
-}
-static inline bool test_and_clear_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	if (!(ti->status & TS_RESTORE_SIGMASK))
-		return false;
-	ti->status &= ~TS_RESTORE_SIGMASK;
-	return true;
-}
-#endif	/* !__ASSEMBLY__ */
-
 #endif /* _ASM_IA64_THREAD_INFO_H */
diff --git a/arch/microblaze/include/asm/thread_info.h b/arch/microblaze/include/asm/thread_info.h
index 383f387b4eee..e7e8954e9815 100644
--- a/arch/microblaze/include/asm/thread_info.h
+++ b/arch/microblaze/include/asm/thread_info.h
@@ -148,33 +148,6 @@ static inline struct thread_info *current_thread_info(void)
  */
 /* FPU was used by this task this quantum (SMP) */
 #define TS_USEDFPU		0x0001
-#define TS_RESTORE_SIGMASK	0x0002
-
-#ifndef __ASSEMBLY__
-#define HAVE_SET_RESTORE_SIGMASK 1
-static inline void set_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	ti->status |= TS_RESTORE_SIGMASK;
-	WARN_ON(!test_bit(TIF_SIGPENDING, (unsigned long *)&ti->flags));
-}
-static inline void clear_restore_sigmask(void)
-{
-	current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
-}
-static inline bool test_restore_sigmask(void)
-{
-	return current_thread_info()->status & TS_RESTORE_SIGMASK;
-}
-static inline bool test_and_clear_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	if (!(ti->status & TS_RESTORE_SIGMASK))
-		return false;
-	ti->status &= ~TS_RESTORE_SIGMASK;
-	return true;
-}
-#endif
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_MICROBLAZE_THREAD_INFO_H */
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index b21bb1f72314..87e4b2d8dcd4 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -138,40 +138,15 @@ static inline struct thread_info *current_thread_info(void)
 /* Don't move TLF_NAPPING without adjusting the code in entry_32.S */
 #define TLF_NAPPING		0	/* idle thread enabled NAP mode */
 #define TLF_SLEEPING		1	/* suspend code enabled SLEEP mode */
-#define TLF_RESTORE_SIGMASK	2	/* Restore signal mask in do_signal */
 #define TLF_LAZY_MMU		3	/* tlb_batch is active */
 #define TLF_RUNLATCH		4	/* Is the runlatch enabled? */
 
 #define _TLF_NAPPING		(1 << TLF_NAPPING)
 #define _TLF_SLEEPING		(1 << TLF_SLEEPING)
-#define _TLF_RESTORE_SIGMASK	(1 << TLF_RESTORE_SIGMASK)
 #define _TLF_LAZY_MMU		(1 << TLF_LAZY_MMU)
 #define _TLF_RUNLATCH		(1 << TLF_RUNLATCH)
 
 #ifndef __ASSEMBLY__
-#define HAVE_SET_RESTORE_SIGMASK	1
-static inline void set_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	ti->local_flags |= _TLF_RESTORE_SIGMASK;
-	WARN_ON(!test_bit(TIF_SIGPENDING, &ti->flags));
-}
-static inline void clear_restore_sigmask(void)
-{
-	current_thread_info()->local_flags &= ~_TLF_RESTORE_SIGMASK;
-}
-static inline bool test_restore_sigmask(void)
-{
-	return current_thread_info()->local_flags & _TLF_RESTORE_SIGMASK;
-}
-static inline bool test_and_clear_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	if (!(ti->local_flags & _TLF_RESTORE_SIGMASK))
-		return false;
-	ti->local_flags &= ~_TLF_RESTORE_SIGMASK;
-	return true;
-}
 
 static inline bool test_thread_local_flags(unsigned int flags)
 {
diff --git a/arch/sh/include/asm/thread_info.h b/arch/sh/include/asm/thread_info.h
index 2afa321157be..6c65dcd470ab 100644
--- a/arch/sh/include/asm/thread_info.h
+++ b/arch/sh/include/asm/thread_info.h
@@ -151,19 +151,10 @@ extern void init_thread_xstate(void);
  * ever touches our thread-synchronous status, so we don't
  * have to worry about atomic accesses.
  */
-#define TS_RESTORE_SIGMASK	0x0001	/* restore signal mask in do_signal() */
 #define TS_USEDFPU		0x0002	/* FPU used by this task this quantum */
 
 #ifndef __ASSEMBLY__
 
-#define HAVE_SET_RESTORE_SIGMASK	1
-static inline void set_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	ti->status |= TS_RESTORE_SIGMASK;
-	WARN_ON(!test_bit(TIF_SIGPENDING, (unsigned long *)&ti->flags));
-}
-
 #define TI_FLAG_FAULT_CODE_SHIFT	24
 
 /*
@@ -182,23 +173,6 @@ static inline unsigned int get_thread_fault_code(void)
 	return ti->flags >> TI_FLAG_FAULT_CODE_SHIFT;
 }
 
-static inline void clear_restore_sigmask(void)
-{
-	current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
-}
-static inline bool test_restore_sigmask(void)
-{
-	return current_thread_info()->status & TS_RESTORE_SIGMASK;
-}
-static inline bool test_and_clear_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	if (!(ti->status & TS_RESTORE_SIGMASK))
-		return false;
-	ti->status &= ~TS_RESTORE_SIGMASK;
-	return true;
-}
-
 #endif	/* !__ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/sparc/include/asm/thread_info_64.h b/arch/sparc/include/asm/thread_info_64.h
index bde59825d06c..3d7b925f6516 100644
--- a/arch/sparc/include/asm/thread_info_64.h
+++ b/arch/sparc/include/asm/thread_info_64.h
@@ -222,32 +222,8 @@ register struct thread_info *current_thread_info_reg asm("g6");
  *
  * Note that there are only 8 bits available.
  */
-#define TS_RESTORE_SIGMASK	0x0001	/* restore signal mask in do_signal() */
 
 #ifndef __ASSEMBLY__
-#define HAVE_SET_RESTORE_SIGMASK	1
-static inline void set_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	ti->status |= TS_RESTORE_SIGMASK;
-	WARN_ON(!test_bit(TIF_SIGPENDING, &ti->flags));
-}
-static inline void clear_restore_sigmask(void)
-{
-	current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
-}
-static inline bool test_restore_sigmask(void)
-{
-	return current_thread_info()->status & TS_RESTORE_SIGMASK;
-}
-static inline bool test_and_clear_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	if (!(ti->status & TS_RESTORE_SIGMASK))
-		return false;
-	ti->status &= ~TS_RESTORE_SIGMASK;
-	return true;
-}
 
 #define thread32_stack_is_64bit(__SP) (((__SP) & 0x1) != 0)
 #define test_thread_64bit_stack(__SP) \
diff --git a/arch/tile/include/asm/thread_info.h b/arch/tile/include/asm/thread_info.h
index c1467ac59ce6..b7659b8f1117 100644
--- a/arch/tile/include/asm/thread_info.h
+++ b/arch/tile/include/asm/thread_info.h
@@ -166,32 +166,5 @@ extern void _cpu_idle(void);
 #ifdef __tilegx__
 #define TS_COMPAT		0x0001	/* 32-bit compatibility mode */
 #endif
-#define TS_RESTORE_SIGMASK	0x0008	/* restore signal mask in do_signal */
-
-#ifndef __ASSEMBLY__
-#define HAVE_SET_RESTORE_SIGMASK	1
-static inline void set_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	ti->status |= TS_RESTORE_SIGMASK;
-	WARN_ON(!test_bit(TIF_SIGPENDING, &ti->flags));
-}
-static inline void clear_restore_sigmask(void)
-{
-	current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
-}
-static inline bool test_restore_sigmask(void)
-{
-	return current_thread_info()->status & TS_RESTORE_SIGMASK;
-}
-static inline bool test_and_clear_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	if (!(ti->status & TS_RESTORE_SIGMASK))
-		return false;
-	ti->status &= ~TS_RESTORE_SIGMASK;
-	return true;
-}
-#endif	/* !__ASSEMBLY__ */
 
 #endif /* _ASM_TILE_THREAD_INFO_H */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 89bff044a6f5..b45ffdda3549 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -219,32 +219,8 @@ static inline unsigned long current_stack_pointer(void)
  * have to worry about atomic accesses.
  */
 #define TS_COMPAT		0x0002	/* 32bit syscall active (64BIT)*/
-#define TS_RESTORE_SIGMASK	0x0008	/* restore signal mask in do_signal() */
 
 #ifndef __ASSEMBLY__
-#define HAVE_SET_RESTORE_SIGMASK	1
-static inline void set_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	ti->status |= TS_RESTORE_SIGMASK;
-	WARN_ON(!test_bit(TIF_SIGPENDING, (unsigned long *)&ti->flags));
-}
-static inline void clear_restore_sigmask(void)
-{
-	current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
-}
-static inline bool test_restore_sigmask(void)
-{
-	return current_thread_info()->status & TS_RESTORE_SIGMASK;
-}
-static inline bool test_and_clear_restore_sigmask(void)
-{
-	struct thread_info *ti = current_thread_info();
-	if (!(ti->status & TS_RESTORE_SIGMASK))
-		return false;
-	ti->status &= ~TS_RESTORE_SIGMASK;
-	return true;
-}
 
 static inline bool in_ia32_syscall(void)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 553af2923824..62c68e513e39 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1547,6 +1547,9 @@ struct task_struct {
 	/* unserialized, strictly 'current' */
 	unsigned in_execve:1; /* bit to tell LSMs we're in execve */
 	unsigned in_iowait:1;
+#if !defined(TIF_RESTORE_SIGMASK)
+	unsigned restore_sigmask:1;
+#endif
 #ifdef CONFIG_MEMCG
 	unsigned memcg_may_oom:1;
 #ifndef CONFIG_SLOB
@@ -2680,6 +2683,66 @@ extern void sigqueue_free(struct sigqueue *);
 extern int send_sigqueue(struct sigqueue *,  struct task_struct *, int group);
 extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
 
+#ifdef TIF_RESTORE_SIGMASK
+/*
+ * Legacy restore_sigmask accessors.  These are inefficient on
+ * SMP architectures because they require atomic operations.
+ */
+
+/**
+ * set_restore_sigmask() - make sure saved_sigmask processing gets done
+ *
+ * This sets TIF_RESTORE_SIGMASK and ensures that the arch signal code
+ * will run before returning to user mode, to process the flag.  For
+ * all callers, TIF_SIGPENDING is already set or it's no harm to set
+ * it.  TIF_RESTORE_SIGMASK need not be in the set of bits that the
+ * arch code will notice on return to user mode, in case those bits
+ * are scarce.  We set TIF_SIGPENDING here to ensure that the arch
+ * signal code always gets run when TIF_RESTORE_SIGMASK is set.
+ */
+static inline void set_restore_sigmask(void)
+{
+	set_thread_flag(TIF_RESTORE_SIGMASK);
+	WARN_ON(!test_thread_flag(TIF_SIGPENDING));
+}
+static inline void clear_restore_sigmask(void)
+{
+	clear_thread_flag(TIF_RESTORE_SIGMASK);
+}
+static inline bool test_restore_sigmask(void)
+{
+	return test_thread_flag(TIF_RESTORE_SIGMASK);
+}
+static inline bool test_and_clear_restore_sigmask(void)
+{
+	return test_and_clear_thread_flag(TIF_RESTORE_SIGMASK);
+}
+
+#else	/* TIF_RESTORE_SIGMASK */
+
+/* Higher-quality implementation, used if TIF_RESTORE_SIGMASK doesn't exist. */
+static inline void set_restore_sigmask(void)
+{
+	current->restore_sigmask = true;
+	WARN_ON(!test_thread_flag(TIF_SIGPENDING));
+}
+static inline void clear_restore_sigmask(void)
+{
+	current->restore_sigmask = false;
+}
+static inline bool test_restore_sigmask(void)
+{
+	return current->restore_sigmask;
+}
+static inline bool test_and_clear_restore_sigmask(void)
+{
+	if (!current->restore_sigmask)
+		return false;
+	current->restore_sigmask = false;
+	return true;
+}
+#endif
+
 static inline void restore_saved_sigmask(void)
 {
 	if (test_and_clear_restore_sigmask())
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index b4c2a485b28a..352b1542f5cc 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -105,47 +105,6 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
 
 #define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
 
-#if defined TIF_RESTORE_SIGMASK && !defined HAVE_SET_RESTORE_SIGMASK
-/*
- * An arch can define its own version of set_restore_sigmask() to get the
- * job done however works, with or without TIF_RESTORE_SIGMASK.
- */
-#define HAVE_SET_RESTORE_SIGMASK	1
-
-/**
- * set_restore_sigmask() - make sure saved_sigmask processing gets done
- *
- * This sets TIF_RESTORE_SIGMASK and ensures that the arch signal code
- * will run before returning to user mode, to process the flag.  For
- * all callers, TIF_SIGPENDING is already set or it's no harm to set
- * it.  TIF_RESTORE_SIGMASK need not be in the set of bits that the
- * arch code will notice on return to user mode, in case those bits
- * are scarce.  We set TIF_SIGPENDING here to ensure that the arch
- * signal code always gets run when TIF_RESTORE_SIGMASK is set.
- */
-static inline void set_restore_sigmask(void)
-{
-	set_thread_flag(TIF_RESTORE_SIGMASK);
-	WARN_ON(!test_thread_flag(TIF_SIGPENDING));
-}
-static inline void clear_restore_sigmask(void)
-{
-	clear_thread_flag(TIF_RESTORE_SIGMASK);
-}
-static inline bool test_restore_sigmask(void)
-{
-	return test_thread_flag(TIF_RESTORE_SIGMASK);
-}
-static inline bool test_and_clear_restore_sigmask(void)
-{
-	return test_and_clear_thread_flag(TIF_RESTORE_SIGMASK);
-}
-#endif	/* TIF_RESTORE_SIGMASK && !HAVE_SET_RESTORE_SIGMASK */
-
-#ifndef HAVE_SET_RESTORE_SIGMASK
-#error "no set_restore_sigmask() provided and default one won't work"
-#endif
-
 #endif	/* __KERNEL__ */
 
 #endif /* _LINUX_THREAD_INFO_H */

From 627393d44860386e948bb63a8e5b53f2cc44d070 Mon Sep 17 00:00:00 2001
From: Anton Blanchard <anton@samba.org>
Date: Tue, 2 Aug 2016 14:05:40 -0700
Subject: [PATCH 073/111] kernel/exit.c: quieten greatest stack depth printk

Many targets enable CONFIG_DEBUG_STACK_USAGE, and while the information
is useful, it isn't worthy of pr_warn().  Reduce it to pr_info().

Link: http://lkml.kernel.org/r/1466982072-29836-1-git-send-email-anton@ozlabs.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/exit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 84ae830234f8..2f974ae042a6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -715,7 +715,7 @@ static void check_stack_usage(void)
 
 	spin_lock(&low_water_lock);
 	if (free < lowest_to_date) {
-		pr_warn("%s (%d) used greatest stack depth: %lu bytes left\n",
+		pr_info("%s (%d) used greatest stack depth: %lu bytes left\n",
 			current->comm, task_pid_nr(current), free);
 		lowest_to_date = free;
 	}

From b06fb415331a7beb841f3d20d0fe60f6f0787dba Mon Sep 17 00:00:00 2001
From: Geliang Tang <geliangtang@gmail.com>
Date: Tue, 2 Aug 2016 14:05:42 -0700
Subject: [PATCH 074/111] cpumask: fix code comment

Fix code comment for cpumask_parse().

Link: http://lkml.kernel.org/r/71aae2c60ae5dae0cf554199ce6aea8f88c69347.1465380581.git.geliangtang@gmail.com
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/cpumask.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index e828cf65d7df..da7fbf1cdd56 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -579,7 +579,7 @@ static inline int cpumask_parselist_user(const char __user *buf, int len,
 }
 
 /**
- * cpumask_parse - extract a cpumask from from a string
+ * cpumask_parse - extract a cpumask from a string
  * @buf: the buffer to extract from
  * @dstp: the cpumask to set.
  *

From 4caf9615247aceab56e91df6c0e11892ea55f4f0 Mon Sep 17 00:00:00 2001
From: Minfei Huang <mnghuan@gmail.com>
Date: Tue, 2 Aug 2016 14:05:45 -0700
Subject: [PATCH 075/111] kexec: return error number directly

This is a cleanup patch to make kexec more clear to return error number
directly.  The variable result is useless, because there is no other
function's return value assignes to it.  So remove it.

Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/kexec_core.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 56b3ed0927b0..23311c803b1b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -147,7 +147,7 @@ static struct page *kimage_alloc_page(struct kimage *image,
 
 int sanity_check_segment_list(struct kimage *image)
 {
-	int result, i;
+	int i;
 	unsigned long nr_segments = image->nr_segments;
 
 	/*
@@ -163,16 +163,15 @@ int sanity_check_segment_list(struct kimage *image)
 	 * simply because addresses are changed to page size
 	 * granularity.
 	 */
-	result = -EADDRNOTAVAIL;
 	for (i = 0; i < nr_segments; i++) {
 		unsigned long mstart, mend;
 
 		mstart = image->segment[i].mem;
 		mend   = mstart + image->segment[i].memsz;
 		if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK))
-			return result;
+			return -EADDRNOTAVAIL;
 		if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT)
-			return result;
+			return -EADDRNOTAVAIL;
 	}
 
 	/* Verify our destination addresses do not overlap.
@@ -180,7 +179,6 @@ int sanity_check_segment_list(struct kimage *image)
 	 * through very weird things can happen with no
 	 * easy explanation as one segment stops on another.
 	 */
-	result = -EINVAL;
 	for (i = 0; i < nr_segments; i++) {
 		unsigned long mstart, mend;
 		unsigned long j;
@@ -194,7 +192,7 @@ int sanity_check_segment_list(struct kimage *image)
 			pend   = pstart + image->segment[j].memsz;
 			/* Do the segments overlap ? */
 			if ((mend > pstart) && (mstart < pend))
-				return result;
+				return -EINVAL;
 		}
 	}
 
@@ -203,10 +201,9 @@ int sanity_check_segment_list(struct kimage *image)
 	 * and it is easier to check up front than to be surprised
 	 * later on.
 	 */
-	result = -EINVAL;
 	for (i = 0; i < nr_segments; i++) {
 		if (image->segment[i].bufsz > image->segment[i].memsz)
-			return result;
+			return -EINVAL;
 	}
 
 	/*
@@ -220,7 +217,6 @@ int sanity_check_segment_list(struct kimage *image)
 	 */
 
 	if (image->type == KEXEC_TYPE_CRASH) {
-		result = -EADDRNOTAVAIL;
 		for (i = 0; i < nr_segments; i++) {
 			unsigned long mstart, mend;
 
@@ -229,7 +225,7 @@ int sanity_check_segment_list(struct kimage *image)
 			/* Ensure we are within the crash kernel limits */
 			if ((mstart < crashk_res.start) ||
 			    (mend > crashk_res.end))
-				return result;
+				return -EADDRNOTAVAIL;
 		}
 	}
 

From f7f0b7dc720f81b53afffb6779437086cdc3f62d Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Tue, 2 Aug 2016 14:05:48 -0700
Subject: [PATCH 076/111] ARM: kdump: advertise boot aliased crash kernel
 resource

Advertise a resource which describes where the crash kernel is located
in the boot view of RAM.  This allows kexec-tools to have this vital
information.

Link: http://lkml.kernel.org/r/E1b8knz-0004H4-Bd@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/arm/kernel/setup.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index da2f6c360f6b..6c8c888c1152 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -1000,9 +1000,25 @@ static void __init reserve_crashkernel(void)
 		(unsigned long)(crash_base >> 20),
 		(unsigned long)(total_mem >> 20));
 
+	/* The crashk resource must always be located in normal mem */
 	crashk_res.start = crash_base;
 	crashk_res.end = crash_base + crash_size - 1;
 	insert_resource(&iomem_resource, &crashk_res);
+
+	if (arm_has_idmap_alias()) {
+		/*
+		 * If we have a special RAM alias for use at boot, we
+		 * need to advertise to kexec tools where the alias is.
+		 */
+		static struct resource crashk_boot_res = {
+			.name = "Crash kernel (boot alias)",
+			.flags = IORESOURCE_BUSY | IORESOURCE_MEM,
+		};
+
+		crashk_boot_res.start = phys_to_idmap(crash_base);
+		crashk_boot_res.end = crashk_boot_res.start + crash_size - 1;
+		insert_resource(&iomem_resource, &crashk_boot_res);
+	}
 }
 #else
 static inline void reserve_crashkernel(void) {}

From 966fab00b0e19e0db3cb11d81bda5d0940176d5e Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Tue, 2 Aug 2016 14:05:51 -0700
Subject: [PATCH 077/111] ARM: kexec: advertise location of bootable RAM

Advertise the location of bootable RAM to kexec-tools.  kexec needs to
know where it can place the kernel in RAM, and so be executable when the
system needs to jump into it.

Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
tag.

Link: http://lkml.kernel.org/r/E1b8ko4-0004HA-GF@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Pratyush Anand <panand@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/arm/kernel/setup.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 6c8c888c1152..df7f2a75e769 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -848,10 +848,29 @@ static void __init request_standard_resources(const struct machine_desc *mdesc)
 	kernel_data.end     = virt_to_phys(_end - 1);
 
 	for_each_memblock(memory, region) {
+		phys_addr_t start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
+		phys_addr_t end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
+		unsigned long boot_alias_start;
+
+		/*
+		 * Some systems have a special memory alias which is only
+		 * used for booting.  We need to advertise this region to
+		 * kexec-tools so they know where bootable RAM is located.
+		 */
+		boot_alias_start = phys_to_idmap(start);
+		if (arm_has_idmap_alias() && boot_alias_start != IDMAP_INVALID_ADDR) {
+			res = memblock_virt_alloc(sizeof(*res), 0);
+			res->name = "System RAM (boot alias)";
+			res->start = boot_alias_start;
+			res->end = phys_to_idmap(end);
+			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+			request_resource(&iomem_resource, res);
+		}
+
 		res = memblock_virt_alloc(sizeof(*res), 0);
 		res->name  = "System RAM";
-		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
-		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
+		res->start = start;
+		res->end = end;
 		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 
 		request_resource(&iomem_resource, res);

From dc5cccacf4272da4aba20a1fc0804d59d985ab32 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Tue, 2 Aug 2016 14:05:54 -0700
Subject: [PATCH 078/111] kexec: don't invoke OOM-killer for control page
 allocation

If we are unable to find a suitable page when allocating the control
page, do not invoke the OOM-killer: killing processes probably isn't
going to help.

Link: http://lkml.kernel.org/r/E1b8ko9-0004HG-R5@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Pratyush Anand <panand@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/kexec.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index e8acb2b43dd9..ce2fe197f583 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -41,7 +41,7 @@
 #endif
 
 #ifndef KEXEC_CONTROL_MEMORY_GFP
-#define KEXEC_CONTROL_MEMORY_GFP GFP_KERNEL
+#define KEXEC_CONTROL_MEMORY_GFP (GFP_KERNEL | __GFP_NORETRY)
 #endif
 
 #ifndef KEXEC_CONTROL_PAGE_SIZE

From 465d377701dfe6a08a9f361a3fd926dea7f89c74 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Tue, 2 Aug 2016 14:05:57 -0700
Subject: [PATCH 079/111] kexec: ensure user memory sizes do not wrap

Ensure that user memory sizes do not wrap around when validating the
user input, which can lead to the following input validation working
incorrectly.

[akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch]
Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Pratyush Anand <panand@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/kexec_core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23311c803b1b..5a83b2a9d584 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -168,6 +168,8 @@ int sanity_check_segment_list(struct kimage *image)
 
 		mstart = image->segment[i].mem;
 		mend   = mstart + image->segment[i].memsz;
+		if (mstart > mend)
+			return -EADDRNOTAVAIL;
 		if ((mstart & ~PAGE_MASK) || (mend & ~PAGE_MASK))
 			return -EADDRNOTAVAIL;
 		if (mend >= KEXEC_DESTINATION_MEMORY_LIMIT)

From dae28018f56645b61f5beb84d5831346d3c5e457 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Tue, 2 Aug 2016 14:06:00 -0700
Subject: [PATCH 080/111] kdump: arrange for paddr_vmcoreinfo_note() to return
 phys_addr_t

On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB
physical on 32-bit architectures, so we need a wider type than "unsigned
long" here.  Arrange for paddr_vmcoreinfo_note() to return a
phys_addr_t, thereby allowing it to be located above 4GB.

This makes no difference for kexec-tools, as they already assume a
64-bit type when reading from this file.

Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Pratyush Anand <panand@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/ia64/kernel/machine_kexec.c | 2 +-
 include/linux/kexec.h            | 2 +-
 kernel/kexec_core.c              | 2 +-
 kernel/ksysfs.c                  | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/kernel/machine_kexec.c b/arch/ia64/kernel/machine_kexec.c
index b72cd7a07222..599507bcec91 100644
--- a/arch/ia64/kernel/machine_kexec.c
+++ b/arch/ia64/kernel/machine_kexec.c
@@ -163,7 +163,7 @@ void arch_crash_save_vmcoreinfo(void)
 #endif
 }
 
-unsigned long paddr_vmcoreinfo_note(void)
+phys_addr_t paddr_vmcoreinfo_note(void)
 {
 	return ia64_tpa((unsigned long)(char *)&vmcoreinfo_note);
 }
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index ce2fe197f583..555227f0029f 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -233,7 +233,7 @@ void crash_save_vmcoreinfo(void);
 void arch_crash_save_vmcoreinfo(void);
 __printf(1, 2)
 void vmcoreinfo_append_str(const char *fmt, ...);
-unsigned long paddr_vmcoreinfo_note(void);
+phys_addr_t paddr_vmcoreinfo_note(void);
 
 #define VMCOREINFO_OSRELEASE(value) \
 	vmcoreinfo_append_str("OSRELEASE=%s\n", value)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 5a83b2a9d584..dab03f17be25 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1372,7 +1372,7 @@ void vmcoreinfo_append_str(const char *fmt, ...)
 void __weak arch_crash_save_vmcoreinfo(void)
 {}
 
-unsigned long __weak paddr_vmcoreinfo_note(void)
+phys_addr_t __weak paddr_vmcoreinfo_note(void)
 {
 	return __pa((unsigned long)(char *)&vmcoreinfo_note);
 }
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 152da4a48867..9f1920d2d0c6 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -128,8 +128,8 @@ KERNEL_ATTR_RW(kexec_crash_size);
 static ssize_t vmcoreinfo_show(struct kobject *kobj,
 			       struct kobj_attribute *attr, char *buf)
 {
-	return sprintf(buf, "%lx %x\n",
-		       paddr_vmcoreinfo_note(),
+	phys_addr_t vmcore_base = paddr_vmcoreinfo_note();
+	return sprintf(buf, "%pa %x\n", &vmcore_base,
 		       (unsigned int)sizeof(vmcoreinfo_note));
 }
 KERNEL_ATTR_RO(vmcoreinfo);

From 43546d8669d62d75fa69ca9a45d2f586665f56bd Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Tue, 2 Aug 2016 14:06:04 -0700
Subject: [PATCH 081/111] kexec: allow architectures to override boot mapping

kexec physical addresses are the boot-time view of the system.  For
certain ARM systems (such as Keystone 2), the boot view of the system
does not match the kernel's view of the system: the boot view uses a
special alias in the lower 4GB of the physical address space.

To cater for these kinds of setups, we need to translate between the
boot view physical addresses and the normal kernel view physical
addresses.  This patch extracts the current transation points into
linux/kexec.h, and allows an architecture to override the functions.

Due to the translations required, we unfortunately end up with six
translation functions, which are reduced down to four that the
architecture can override.

[akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()]
Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/kexec.h | 40 ++++++++++++++++++++++++++++++++++++++++
 kernel/kexec.c        |  3 ++-
 kernel/kexec_core.c   | 26 +++++++++++++-------------
 3 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 555227f0029f..23e14a460cfb 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -14,6 +14,8 @@
 
 #if !defined(__ASSEMBLY__)
 
+#include <asm/io.h>
+
 #include <uapi/linux/kexec.h>
 
 #ifdef CONFIG_KEXEC_CORE
@@ -318,6 +320,44 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 void arch_kexec_protect_crashkres(void);
 void arch_kexec_unprotect_crashkres(void);
 
+#ifndef page_to_boot_pfn
+static inline unsigned long page_to_boot_pfn(struct page *page)
+{
+	return page_to_pfn(page);
+}
+#endif
+
+#ifndef boot_pfn_to_page
+static inline struct page *boot_pfn_to_page(unsigned long boot_pfn)
+{
+	return pfn_to_page(boot_pfn);
+}
+#endif
+
+#ifndef phys_to_boot_phys
+static inline unsigned long phys_to_boot_phys(phys_addr_t phys)
+{
+	return phys;
+}
+#endif
+
+#ifndef boot_phys_to_phys
+static inline phys_addr_t boot_phys_to_phys(unsigned long boot_phys)
+{
+	return boot_phys;
+}
+#endif
+
+static inline unsigned long virt_to_boot_phys(void *addr)
+{
+	return phys_to_boot_phys(__pa((unsigned long)addr));
+}
+
+static inline void *boot_phys_to_virt(unsigned long entry)
+{
+	return phys_to_virt(boot_phys_to_phys(entry));
+}
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 4384672d3245..980936a90ee6 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -48,7 +48,8 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
 
 	if (kexec_on_panic) {
 		/* Verify we have a valid entry point */
-		if ((entry < crashk_res.start) || (entry > crashk_res.end))
+		if ((entry < phys_to_boot_phys(crashk_res.start)) ||
+		    (entry > phys_to_boot_phys(crashk_res.end)))
 			return -EADDRNOTAVAIL;
 	}
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index dab03f17be25..73d4c5f57dd8 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -225,8 +225,8 @@ int sanity_check_segment_list(struct kimage *image)
 			mstart = image->segment[i].mem;
 			mend = mstart + image->segment[i].memsz - 1;
 			/* Ensure we are within the crash kernel limits */
-			if ((mstart < crashk_res.start) ||
-			    (mend > crashk_res.end))
+			if ((mstart < phys_to_boot_phys(crashk_res.start)) ||
+			    (mend > phys_to_boot_phys(crashk_res.end)))
 				return -EADDRNOTAVAIL;
 		}
 	}
@@ -350,7 +350,7 @@ static struct page *kimage_alloc_normal_control_pages(struct kimage *image,
 		pages = kimage_alloc_pages(KEXEC_CONTROL_MEMORY_GFP, order);
 		if (!pages)
 			break;
-		pfn   = page_to_pfn(pages);
+		pfn   = page_to_boot_pfn(pages);
 		epfn  = pfn + count;
 		addr  = pfn << PAGE_SHIFT;
 		eaddr = epfn << PAGE_SHIFT;
@@ -476,7 +476,7 @@ static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
 			return -ENOMEM;
 
 		ind_page = page_address(page);
-		*image->entry = virt_to_phys(ind_page) | IND_INDIRECTION;
+		*image->entry = virt_to_boot_phys(ind_page) | IND_INDIRECTION;
 		image->entry = ind_page;
 		image->last_entry = ind_page +
 				      ((PAGE_SIZE/sizeof(kimage_entry_t)) - 1);
@@ -531,13 +531,13 @@ void kimage_terminate(struct kimage *image)
 #define for_each_kimage_entry(image, ptr, entry) \
 	for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \
 		ptr = (entry & IND_INDIRECTION) ? \
-			phys_to_virt((entry & PAGE_MASK)) : ptr + 1)
+			boot_phys_to_virt((entry & PAGE_MASK)) : ptr + 1)
 
 static void kimage_free_entry(kimage_entry_t entry)
 {
 	struct page *page;
 
-	page = pfn_to_page(entry >> PAGE_SHIFT);
+	page = boot_pfn_to_page(entry >> PAGE_SHIFT);
 	kimage_free_pages(page);
 }
 
@@ -631,7 +631,7 @@ static struct page *kimage_alloc_page(struct kimage *image,
 	 * have a match.
 	 */
 	list_for_each_entry(page, &image->dest_pages, lru) {
-		addr = page_to_pfn(page) << PAGE_SHIFT;
+		addr = page_to_boot_pfn(page) << PAGE_SHIFT;
 		if (addr == destination) {
 			list_del(&page->lru);
 			return page;
@@ -646,12 +646,12 @@ static struct page *kimage_alloc_page(struct kimage *image,
 		if (!page)
 			return NULL;
 		/* If the page cannot be used file it away */
-		if (page_to_pfn(page) >
+		if (page_to_boot_pfn(page) >
 				(KEXEC_SOURCE_MEMORY_LIMIT >> PAGE_SHIFT)) {
 			list_add(&page->lru, &image->unusable_pages);
 			continue;
 		}
-		addr = page_to_pfn(page) << PAGE_SHIFT;
+		addr = page_to_boot_pfn(page) << PAGE_SHIFT;
 
 		/* If it is the destination page we want use it */
 		if (addr == destination)
@@ -674,7 +674,7 @@ static struct page *kimage_alloc_page(struct kimage *image,
 			struct page *old_page;
 
 			old_addr = *old & PAGE_MASK;
-			old_page = pfn_to_page(old_addr >> PAGE_SHIFT);
+			old_page = boot_pfn_to_page(old_addr >> PAGE_SHIFT);
 			copy_highpage(page, old_page);
 			*old = addr | (*old & ~PAGE_MASK);
 
@@ -730,7 +730,7 @@ static int kimage_load_normal_segment(struct kimage *image,
 			result  = -ENOMEM;
 			goto out;
 		}
-		result = kimage_add_page(image, page_to_pfn(page)
+		result = kimage_add_page(image, page_to_boot_pfn(page)
 								<< PAGE_SHIFT);
 		if (result < 0)
 			goto out;
@@ -791,7 +791,7 @@ static int kimage_load_crash_segment(struct kimage *image,
 		char *ptr;
 		size_t uchunk, mchunk;
 
-		page = pfn_to_page(maddr >> PAGE_SHIFT);
+		page = boot_pfn_to_page(maddr >> PAGE_SHIFT);
 		if (!page) {
 			result  = -ENOMEM;
 			goto out;
@@ -919,7 +919,7 @@ void __weak crash_free_reserved_phys_range(unsigned long begin,
 	unsigned long addr;
 
 	for (addr = begin; addr < end; addr += PAGE_SIZE)
-		free_reserved_page(pfn_to_page(addr >> PAGE_SHIFT));
+		free_reserved_page(boot_pfn_to_page(addr >> PAGE_SHIFT));
 }
 
 int crash_shrink_memory(unsigned long new_size)

From 51d5d12b8f3df2b770974ce4aa6196c6b7d485eb Mon Sep 17 00:00:00 2001
From: Vitaly Andrianov <vitalya@ti.com>
Date: Tue, 2 Aug 2016 14:06:07 -0700
Subject: [PATCH 082/111] ARM: keystone: dts: add psci command definition

This commit adds definition for cpu_on, cpu_off and cpu_suspend
commands.  These definitions must match the corresponding PSCI
definitions in boot monitor.

Having those command and corresponding PSCI support in boot monitor
allows run time CPU hot plugin.

Link: http://lkml.kernel.org/r/E1b8koV-0004Hf-2j@rmk-PC.armlinux.org.uk
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/arm/boot/dts/keystone.dtsi | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/boot/dts/keystone.dtsi b/arch/arm/boot/dts/keystone.dtsi
index e23f46d15c80..00cb314d5e4d 100644
--- a/arch/arm/boot/dts/keystone.dtsi
+++ b/arch/arm/boot/dts/keystone.dtsi
@@ -70,6 +70,14 @@
 		cpu_on		= <0x84000003>;
 	};
 
+	psci {
+		compatible	= "arm,psci";
+		method		= "smc";
+		cpu_suspend	= <0x84000001>;
+		cpu_off		= <0x84000002>;
+		cpu_on		= <0x84000003>;
+	};
+
 	soc {
 		#address-cells = <1>;
 		#size-cells = <1>;

From 0719392a61a9dbc2c850bc7bd1a17efba953fcf5 Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Tue, 2 Aug 2016 14:06:10 -0700
Subject: [PATCH 083/111] ARM: kexec: fix kexec for Keystone 2

Provide kexec with the boot view of memory by overriding the normal
kexec translation functions added in a previous patch.  We also need to
fix a call to memblock in machine_kexec_prepare() so that we provide it
with a running-view physical address rather than a boot- view physical
address.

Link: http://lkml.kernel.org/r/E1b8koa-0004Hl-Ey@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Keerthy <j-keerthy@ti.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Vitaly Andrianov <vitalya@ti.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/arm/include/asm/kexec.h    | 24 ++++++++++++++++++++++++
 arch/arm/kernel/machine_kexec.c |  2 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kexec.h b/arch/arm/include/asm/kexec.h
index c2b9b4bdec00..1869af6bac5c 100644
--- a/arch/arm/include/asm/kexec.h
+++ b/arch/arm/include/asm/kexec.h
@@ -53,6 +53,30 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
 /* Function pointer to optional machine-specific reinitialization */
 extern void (*kexec_reinit)(void);
 
+static inline unsigned long phys_to_boot_phys(phys_addr_t phys)
+{
+	return phys_to_idmap(phys);
+}
+#define phys_to_boot_phys phys_to_boot_phys
+
+static inline phys_addr_t boot_phys_to_phys(unsigned long entry)
+{
+	return idmap_to_phys(entry);
+}
+#define boot_phys_to_phys boot_phys_to_phys
+
+static inline unsigned long page_to_boot_pfn(struct page *page)
+{
+	return page_to_pfn(page) + (arch_phys_to_idmap_offset >> PAGE_SHIFT);
+}
+#define page_to_boot_pfn page_to_boot_pfn
+
+static inline struct page *boot_pfn_to_page(unsigned long boot_pfn)
+{
+	return pfn_to_page(boot_pfn - (arch_phys_to_idmap_offset >> PAGE_SHIFT));
+}
+#define boot_pfn_to_page boot_pfn_to_page
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* CONFIG_KEXEC */
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 59fd0e24c56b..b18c1ea56bed 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -57,7 +57,7 @@ int machine_kexec_prepare(struct kimage *image)
 	for (i = 0; i < image->nr_segments; i++) {
 		current_segment = &image->segment[i];
 
-		if (!memblock_is_region_memory(current_segment->mem,
+		if (!memblock_is_region_memory(idmap_to_phys(current_segment->mem),
 					       current_segment->memsz))
 			return -EINVAL;
 

From b26e27ddfd2a986dc53e259aba572f3aac182eb8 Mon Sep 17 00:00:00 2001
From: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Date: Tue, 2 Aug 2016 14:06:13 -0700
Subject: [PATCH 084/111] kexec: use core_param for crash_kexec_post_notifiers
 boot option

crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time.  So, use core_param instead of early_param.

Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/panic.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 8aa74497cc5a..ca8cea1ef673 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -108,6 +108,7 @@ void panic(const char *fmt, ...)
 	long i, i_next = 0;
 	int state = 0;
 	int old_cpu, this_cpu;
+	bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers;
 
 	/*
 	 * Disable local interrupts. This will prevent panic_smp_self_stop
@@ -160,7 +161,7 @@ void panic(const char *fmt, ...)
 	 *
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
-	if (!crash_kexec_post_notifiers) {
+	if (!_crash_kexec_post_notifiers) {
 		printk_nmi_flush_on_panic();
 		__crash_kexec(NULL);
 	}
@@ -191,7 +192,7 @@ void panic(const char *fmt, ...)
 	 *
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
-	if (crash_kexec_post_notifiers)
+	if (_crash_kexec_post_notifiers)
 		__crash_kexec(NULL);
 
 	bust_spinlocks(0);
@@ -571,13 +572,7 @@ EXPORT_SYMBOL(__stack_chk_fail);
 core_param(panic, panic_timeout, int, 0644);
 core_param(pause_on_oops, pause_on_oops, int, 0644);
 core_param(panic_on_warn, panic_on_warn, int, 0644);
-
-static int __init setup_crash_kexec_post_notifiers(char *s)
-{
-	crash_kexec_post_notifiers = true;
-	return 0;
-}
-early_param("crash_kexec_post_notifiers", setup_crash_kexec_post_notifiers);
+core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644);
 
 static int __init oops_setup(char *s)
 {

From 21db79e8bb054d0351a6b1b464f1c9c47a2e6e8d Mon Sep 17 00:00:00 2001
From: Petr Tesarik <ptesarik@suse.com>
Date: Tue, 2 Aug 2016 14:06:16 -0700
Subject: [PATCH 085/111] kexec: add a kexec_crash_loaded() function

Provide a wrapper function to be used by kernel code to check whether a
crash kernel is loaded.  It returns the same value that can be seen in
/sys/kernel/kexec_crash_loaded by userspace programs.

I'm exporting the function, because it will be used by Xen, and it is
possible to compile Xen modules separately to enable the use of PV
drivers with unmodified bare-metal kernels.

Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/kexec.h | 2 ++
 kernel/kexec_core.c   | 6 ++++++
 kernel/ksysfs.c       | 2 +-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 23e14a460cfb..d7437777baaa 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -230,6 +230,7 @@ extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
 extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
+int kexec_crash_loaded(void);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 void crash_save_vmcoreinfo(void);
 void arch_crash_save_vmcoreinfo(void);
@@ -364,6 +365,7 @@ struct task_struct;
 static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }
+static inline int kexec_crash_loaded(void) { return 0; }
 #define kexec_in_progress false
 #endif /* CONFIG_KEXEC_CORE */
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 73d4c5f57dd8..704534029a00 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -95,6 +95,12 @@ int kexec_should_crash(struct task_struct *p)
 	return 0;
 }
 
+int kexec_crash_loaded(void)
+{
+	return !!kexec_crash_image;
+}
+EXPORT_SYMBOL_GPL(kexec_crash_loaded);
+
 /*
  * When kexec transitions to the new kernel there is a one-to-one
  * mapping between physical and virtual addresses.  On processors
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 9f1920d2d0c6..ee1bc1bb8feb 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -101,7 +101,7 @@ KERNEL_ATTR_RO(kexec_loaded);
 static ssize_t kexec_crash_loaded_show(struct kobject *kobj,
 				       struct kobj_attribute *attr, char *buf)
 {
-	return sprintf(buf, "%d\n", !!kexec_crash_image);
+	return sprintf(buf, "%d\n", kexec_crash_loaded());
 }
 KERNEL_ATTR_RO(kexec_crash_loaded);
 

From c0253115968c35f3e1ee497282efb75ccf29fb98 Mon Sep 17 00:00:00 2001
From: Petr Tesarik <ptesarik@suse.com>
Date: Tue, 2 Aug 2016 14:06:19 -0700
Subject: [PATCH 086/111] kexec: allow kdump with crash_kexec_post_notifiers

If a crash kernel is loaded, do not crash the running domain.  This is
needed if the kernel is loaded with crash_kexec_post_notifiers, because
panic notifiers are run before __crash_kexec() in that case, and this
Xen hook prevents its being called later.

[akpm@linux-foundation.org: build fix: unconditionally include kexec.h]
Link: http://lkml.kernel.org/r/20160713122000.14969.99963.stgit@hananiah.suse.cz
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/x86/xen/enlighten.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index cd993051aed7..8ffb089b19a5 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -34,9 +34,7 @@
 #include <linux/edd.h>
 #include <linux/frame.h>
 
-#ifdef CONFIG_KEXEC_CORE
 #include <linux/kexec.h>
-#endif
 
 #include <xen/xen.h>
 #include <xen/events.h>
@@ -1334,7 +1332,8 @@ static void xen_crash_shutdown(struct pt_regs *regs)
 static int
 xen_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
 {
-	xen_reboot(SHUTDOWN_crash);
+	if (!kexec_crash_loaded())
+		xen_reboot(SHUTDOWN_crash);
 	return NOTIFY_DONE;
 }
 

From 1730f146604ea426e54938cdbcf87df1047ef0dc Mon Sep 17 00:00:00 2001
From: zhong jiang <zhongjiang@huawei.com>
Date: Tue, 2 Aug 2016 14:06:22 -0700
Subject: [PATCH 087/111] kexec: add restriction on kexec_load() segment sizes

I hit the following issue when run trinity in my system.  The kernel is
3.4 version, but mainline has the same issue.

The root cause is that the segment size is too large so the kerenl
spends too long trying to allocate a page.  Other cases will block until
the test case quits.  Also, OOM conditions will occur.

Call Trace:
  __alloc_pages_nodemask+0x14c/0x8f0
  alloc_pages_current+0xaf/0x120
  kimage_alloc_pages+0x10/0x60
  kimage_alloc_control_pages+0x5d/0x270
  machine_kexec_prepare+0xe5/0x6c0
  ? kimage_free_page_list+0x52/0x70
  sys_kexec_load+0x141/0x600
  ? vfs_write+0x100/0x180
  system_call_fastpath+0x16/0x1b

The patch changes sanity_check_segment_list() to verify that the usage by
all segments does not exceed half of memory.

[akpm@linux-foundation.org: fix for kexec-return-error-number-directly.patch, update comment]
Link: http://lkml.kernel.org/r/1469625474-53904-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/kexec_core.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 704534029a00..561675589511 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -146,6 +146,7 @@ EXPORT_SYMBOL_GPL(kexec_crash_loaded);
  * allocating pages whose destination address we do not care about.
  */
 #define KIMAGE_NO_DEST (-1UL)
+#define PAGE_COUNT(x) (((x) + PAGE_SIZE - 1) >> PAGE_SHIFT)
 
 static struct page *kimage_alloc_page(struct kimage *image,
 				       gfp_t gfp_mask,
@@ -155,6 +156,7 @@ int sanity_check_segment_list(struct kimage *image)
 {
 	int i;
 	unsigned long nr_segments = image->nr_segments;
+	unsigned long total_pages = 0;
 
 	/*
 	 * Verify we have good destination addresses.  The caller is
@@ -214,6 +216,21 @@ int sanity_check_segment_list(struct kimage *image)
 			return -EINVAL;
 	}
 
+	/*
+	 * Verify that no more than half of memory will be consumed. If the
+	 * request from userspace is too large, a large amount of time will be
+	 * wasted allocating pages, which can cause a soft lockup.
+	 */
+	for (i = 0; i < nr_segments; i++) {
+		if (PAGE_COUNT(image->segment[i].memsz) > totalram_pages / 2)
+			return -EINVAL;
+
+		total_pages += PAGE_COUNT(image->segment[i].memsz);
+	}
+
+	if (total_pages > totalram_pages / 2)
+		return -EINVAL;
+
 	/*
 	 * Verify we have good destination addresses.  Normally
 	 * the caller is responsible for making certain we don't

From b6e8d4aa1110306378af0f3472a6b85a1f039a16 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:25 -0700
Subject: [PATCH 088/111] rapidio: add RapidIO channelized messaging driver

Add channelized messaging driver to support native RapidIO messaging
exchange between multiple senders/recipients on devices that use kernel
RapidIO subsystem services.

This device driver is the result of collaboration within the RapidIO.org
Software Task Group (STG) between Texas Instruments, Prodrive
Technologies, Nokia Networks, BAE and IDT.  Additional input was
received from other members of RapidIO.org.

The objective was to create a character mode driver interface which
exposes messaging capabilities of RapidIO endpoint devices (mports)
directly to applications, in a manner that allows the numerous and
varied RapidIO implementations to interoperate.

This char mode device driver allows user-space applications to setup
messaging communication channels using single shared RapidIO messaging
mailbox.

By default this driver uses RapidIO MBOX_1 (MBOX_0 is reserved for use by
RIONET Ethernet emulation driver).

[weiyj.lk@gmail.com: rapidio/rio_cm: fix return value check in riocm_init()]
  Link: http://lkml.kernel.org/r/1469198221-21970-1-git-send-email-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1468952862-18056-1-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/rapidio/rio_cm.txt |  119 ++
 drivers/rapidio/Kconfig          |    9 +
 drivers/rapidio/Makefile         |    1 +
 drivers/rapidio/rio_cm.c         | 2366 ++++++++++++++++++++++++++++++
 include/uapi/linux/Kbuild        |    1 +
 include/uapi/linux/rio_cm_cdev.h |   78 +
 6 files changed, 2574 insertions(+)
 create mode 100644 Documentation/rapidio/rio_cm.txt
 create mode 100644 drivers/rapidio/rio_cm.c
 create mode 100644 include/uapi/linux/rio_cm_cdev.h

diff --git a/Documentation/rapidio/rio_cm.txt b/Documentation/rapidio/rio_cm.txt
new file mode 100644
index 000000000000..27aa401f1126
--- /dev/null
+++ b/Documentation/rapidio/rio_cm.txt
@@ -0,0 +1,119 @@
+RapidIO subsystem Channelized Messaging character device driver (rio_cm.c)
+==========================================================================
+
+Version History:
+----------------
+  1.0.0 - Initial driver release.
+
+==========================================================================
+
+I. Overview
+
+This device driver is the result of collaboration within the RapidIO.org
+Software Task Group (STG) between Texas Instruments, Prodrive Technologies,
+Nokia Networks, BAE and IDT.  Additional input was received from other members
+of RapidIO.org.
+
+The objective was to create a character mode driver interface which exposes
+messaging capabilities of RapidIO endpoint devices (mports) directly
+to applications, in a manner that allows the numerous and varied RapidIO
+implementations to interoperate.
+
+This driver (RIO_CM) provides to user-space applications shared access to
+RapidIO mailbox messaging resources.
+
+RapidIO specification (Part 2) defines that endpoint devices may have up to four
+messaging mailboxes in case of multi-packet message (up to 4KB) and
+up to 64 mailboxes if single-packet messages (up to 256 B) are used. In addition
+to protocol definition limitations, a particular hardware implementation can
+have reduced number of messaging mailboxes.  RapidIO aware applications must
+therefore share the messaging resources of a RapidIO endpoint.
+
+Main purpose of this device driver is to provide RapidIO mailbox messaging
+capability to large number of user-space processes by introducing socket-like
+operations using a single messaging mailbox.  This allows applications to
+use the limited RapidIO messaging hardware resources efficiently.
+
+Most of device driver's operations are supported through 'ioctl' system calls.
+
+When loaded this device driver creates a single file system node named rio_cm
+in /dev directory common for all registered RapidIO mport devices.
+
+Following ioctl commands are available to user-space applications:
+
+- RIO_CM_MPORT_GET_LIST : Returns to caller list of local mport devices that
+    support messaging operations (number of entries up to RIO_MAX_MPORTS).
+    Each list entry is combination of mport's index in the system and RapidIO
+    destination ID assigned to the port.
+- RIO_CM_EP_GET_LIST_SIZE : Returns number of messaging capable remote endpoints
+    in a RapidIO network associated with the specified mport device.
+- RIO_CM_EP_GET_LIST : Returns list of RapidIO destination IDs for messaging
+    capable remote endpoints (peers) available in a RapidIO network associated
+    with the specified mport device.
+- RIO_CM_CHAN_CREATE : Creates RapidIO message exchange channel data structure
+    with channel ID assigned automatically or as requested by a caller.
+- RIO_CM_CHAN_BIND : Binds the specified channel data structure to the specified
+    mport device.
+- RIO_CM_CHAN_LISTEN : Enables listening for connection requests on the specified
+    channel.
+- RIO_CM_CHAN_ACCEPT : Accepts a connection request from peer on the specified
+    channel. If wait timeout for this request is specified by a caller it is
+    a blocking call. If timeout set to 0 this is non-blocking call - ioctl
+    handler checks for a pending connection request and if one is not available
+    exits with -EGAIN error status immediately.
+- RIO_CM_CHAN_CONNECT : Sends a connection request to a remote peer/channel.
+- RIO_CM_CHAN_SEND : Sends a data message through the specified channel.
+    The handler for this request assumes that message buffer specified by
+    a caller includes the reserved space for a packet header required by
+    this driver.
+- RIO_CM_CHAN_RECEIVE : Receives a data message through a connected channel.
+    If the channel does not have an incoming message ready to return this ioctl
+    handler will wait for new message until timeout specified by a caller
+    expires. If timeout value is set to 0, ioctl handler uses a default value
+    defined by MAX_SCHEDULE_TIMEOUT.
+- RIO_CM_CHAN_CLOSE : Closes a specified channel and frees associated buffers.
+    If the specified channel is in the CONNECTED state, sends close notification
+    to the remote peer.
+
+The ioctl command codes and corresponding data structures intended for use by
+user-space applications are defined in 'include/uapi/linux/rio_cm_cdev.h'.
+
+II. Hardware Compatibility
+
+This device driver uses standard interfaces defined by kernel RapidIO subsystem
+and therefore it can be used with any mport device driver registered by RapidIO
+subsystem with limitations set by available mport HW implementation of messaging
+mailboxes.
+
+III. Module parameters
+
+- 'dbg_level' - This parameter allows to control amount of debug information
+        generated by this device driver. This parameter is formed by set of
+        bit masks that correspond to the specific functional block.
+        For mask definitions see 'drivers/rapidio/devices/rio_cm.c'
+        This parameter can be changed dynamically.
+        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
+
+- 'cmbox' - Number of RapidIO mailbox to use (default value is 1).
+        This parameter allows to set messaging mailbox number that will be used
+        within entire RapidIO network. It can be used when default mailbox is
+        used by other device drivers or is not supported by some nodes in the
+        RapidIO network.
+
+- 'chstart' - Start channel number for dynamic assignment. Default value - 256.
+        Allows to exclude channel numbers below this parameter from dynamic
+        allocation to avoid conflicts with software components that use
+        reserved predefined channel numbers.
+
+IV. Known problems
+
+  None.
+
+V. User-space Applications and API Library
+
+Messaging API library and applications that use this device driver are available
+from RapidIO.org.
+
+VI. TODO List
+
+- Add support for system notification messages (reserved channel 0).
diff --git a/drivers/rapidio/Kconfig b/drivers/rapidio/Kconfig
index b5a10d3c92c7..d6d2f20c4597 100644
--- a/drivers/rapidio/Kconfig
+++ b/drivers/rapidio/Kconfig
@@ -67,6 +67,15 @@ config RAPIDIO_ENUM_BASIC
 
 endchoice
 
+config RAPIDIO_CHMAN
+	tristate "RapidIO Channelized Messaging driver"
+	depends on RAPIDIO
+	help
+	  This option includes RapidIO channelized messaging driver which
+	  provides socket-like interface to allow sharing of single RapidIO
+	  messaging mailbox between multiple user-space applications.
+	  See "Documentation/rapidio/rio_cm.txt" for driver description.
+
 config RAPIDIO_MPORT_CDEV
 	tristate "RapidIO /dev mport device driver"
 	depends on RAPIDIO
diff --git a/drivers/rapidio/Makefile b/drivers/rapidio/Makefile
index 6271ada6993f..74dcea45ad49 100644
--- a/drivers/rapidio/Makefile
+++ b/drivers/rapidio/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_RAPIDIO) += rapidio.o
 rapidio-y := rio.o rio-access.o rio-driver.o rio-sysfs.o
 
 obj-$(CONFIG_RAPIDIO_ENUM_BASIC) += rio-scan.o
+obj-$(CONFIG_RAPIDIO_CHMAN)	+= rio_cm.o
 
 obj-$(CONFIG_RAPIDIO)		+= switches/
 obj-$(CONFIG_RAPIDIO)		+= devices/
diff --git a/drivers/rapidio/rio_cm.c b/drivers/rapidio/rio_cm.c
new file mode 100644
index 000000000000..cecc15a880de
--- /dev/null
+++ b/drivers/rapidio/rio_cm.c
@@ -0,0 +1,2366 @@
+/*
+ * rio_cm - RapidIO Channelized Messaging Driver
+ *
+ * Copyright 2013-2016 Integrated Device Technology, Inc.
+ * Copyright (c) 2015, Prodrive Technologies
+ * Copyright (c) 2015, RapidIO Trade Association
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ * THIS PROGRAM IS DISTRIBUTED IN THE HOPE THAT IT WILL BE USEFUL,
+ * BUT WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED WARRANTY OF
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.  SEE THE
+ * GNU GENERAL PUBLIC LICENSE FOR MORE DETAILS.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/dma-mapping.h>
+#include <linux/delay.h>
+#include <linux/sched.h>
+#include <linux/rio.h>
+#include <linux/rio_drv.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <linux/interrupt.h>
+#include <linux/cdev.h>
+#include <linux/fs.h>
+#include <linux/poll.h>
+#include <linux/reboot.h>
+#include <linux/bitops.h>
+#include <linux/printk.h>
+#include <linux/rio_cm_cdev.h>
+
+#define DRV_NAME        "rio_cm"
+#define DRV_VERSION     "1.0.0"
+#define DRV_AUTHOR      "Alexandre Bounine <alexandre.bounine@idt.com>"
+#define DRV_DESC        "RapidIO Channelized Messaging Driver"
+#define DEV_NAME	"rio_cm"
+
+/* Debug output filtering masks */
+enum {
+	DBG_NONE	= 0,
+	DBG_INIT	= BIT(0), /* driver init */
+	DBG_EXIT	= BIT(1), /* driver exit */
+	DBG_MPORT	= BIT(2), /* mport add/remove */
+	DBG_RDEV	= BIT(3), /* RapidIO device add/remove */
+	DBG_CHOP	= BIT(4), /* channel operations */
+	DBG_WAIT	= BIT(5), /* waiting for events */
+	DBG_TX		= BIT(6), /* message TX */
+	DBG_TX_EVENT	= BIT(7), /* message TX event */
+	DBG_RX_DATA	= BIT(8), /* inbound data messages */
+	DBG_RX_CMD	= BIT(9), /* inbound REQ/ACK/NACK messages */
+	DBG_ALL		= ~0,
+};
+
+#ifdef DEBUG
+#define riocm_debug(level, fmt, arg...) \
+	do { \
+		if (DBG_##level & dbg_level) \
+			pr_debug(DRV_NAME ": %s " fmt "\n", \
+				__func__, ##arg); \
+	} while (0)
+#else
+#define riocm_debug(level, fmt, arg...) \
+		no_printk(KERN_DEBUG pr_fmt(DRV_NAME fmt "\n"), ##arg)
+#endif
+
+#define riocm_warn(fmt, arg...) \
+	pr_warn(DRV_NAME ": %s WARNING " fmt "\n", __func__, ##arg)
+
+#define riocm_error(fmt, arg...) \
+	pr_err(DRV_NAME ": %s ERROR " fmt "\n", __func__, ##arg)
+
+
+static int cmbox = 1;
+module_param(cmbox, int, S_IRUGO);
+MODULE_PARM_DESC(cmbox, "RapidIO Mailbox number (default 1)");
+
+static int chstart = 256;
+module_param(chstart, int, S_IRUGO);
+MODULE_PARM_DESC(chstart,
+		 "Start channel number for dynamic allocation (default 256)");
+
+#ifdef DEBUG
+static u32 dbg_level = DBG_NONE;
+module_param(dbg_level, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(dbg_level, "Debugging output level (default 0 = none)");
+#endif
+
+MODULE_AUTHOR(DRV_AUTHOR);
+MODULE_DESCRIPTION(DRV_DESC);
+MODULE_LICENSE("GPL");
+MODULE_VERSION(DRV_VERSION);
+
+#define RIOCM_TX_RING_SIZE	128
+#define RIOCM_RX_RING_SIZE	128
+#define RIOCM_CONNECT_TO	3 /* connect response TO (in sec) */
+
+#define RIOCM_MAX_CHNUM		0xffff /* Use full range of u16 field */
+#define RIOCM_CHNUM_AUTO	0
+#define RIOCM_MAX_EP_COUNT	0x10000 /* Max number of endpoints */
+
+enum rio_cm_state {
+	RIO_CM_IDLE,
+	RIO_CM_CONNECT,
+	RIO_CM_CONNECTED,
+	RIO_CM_DISCONNECT,
+	RIO_CM_CHAN_BOUND,
+	RIO_CM_LISTEN,
+	RIO_CM_DESTROYING,
+};
+
+enum rio_cm_pkt_type {
+	RIO_CM_SYS	= 0xaa,
+	RIO_CM_CHAN	= 0x55,
+};
+
+enum rio_cm_chop {
+	CM_CONN_REQ,
+	CM_CONN_ACK,
+	CM_CONN_CLOSE,
+	CM_DATA_MSG,
+};
+
+struct rio_ch_base_bhdr {
+	u32 src_id;
+	u32 dst_id;
+#define RIO_HDR_LETTER_MASK 0xffff0000
+#define RIO_HDR_MBOX_MASK   0x0000ffff
+	u8  src_mbox;
+	u8  dst_mbox;
+	u8  type;
+} __attribute__((__packed__));
+
+struct rio_ch_chan_hdr {
+	struct rio_ch_base_bhdr bhdr;
+	u8 ch_op;
+	u16 dst_ch;
+	u16 src_ch;
+	u16 msg_len;
+	u16 rsrvd;
+} __attribute__((__packed__));
+
+struct tx_req {
+	struct list_head node;
+	struct rio_dev   *rdev;
+	void		 *buffer;
+	size_t		 len;
+};
+
+struct cm_dev {
+	struct list_head	list;
+	struct rio_mport	*mport;
+	void			*rx_buf[RIOCM_RX_RING_SIZE];
+	int			rx_slots;
+	struct mutex		rx_lock;
+
+	void			*tx_buf[RIOCM_TX_RING_SIZE];
+	int			tx_slot;
+	int			tx_cnt;
+	int			tx_ack_slot;
+	struct list_head	tx_reqs;
+	spinlock_t		tx_lock;
+
+	struct list_head	peers;
+	u32			npeers;
+	struct workqueue_struct *rx_wq;
+	struct work_struct	rx_work;
+};
+
+struct chan_rx_ring {
+	void	*buf[RIOCM_RX_RING_SIZE];
+	int	head;
+	int	tail;
+	int	count;
+
+	/* Tracking RX buffers reported to upper level */
+	void	*inuse[RIOCM_RX_RING_SIZE];
+	int	inuse_cnt;
+};
+
+struct rio_channel {
+	u16			id;	/* local channel ID */
+	struct kref		ref;	/* channel refcount */
+	struct file		*filp;
+	struct cm_dev		*cmdev;	/* associated CM device object */
+	struct rio_dev		*rdev;	/* remote RapidIO device */
+	enum rio_cm_state	state;
+	int			error;
+	spinlock_t		lock;
+	void			*context;
+	u32			loc_destid;	/* local destID */
+	u32			rem_destid;	/* remote destID */
+	u16			rem_channel;	/* remote channel ID */
+	struct list_head	accept_queue;
+	struct list_head	ch_node;
+	struct completion	comp;
+	struct completion	comp_close;
+	struct chan_rx_ring	rx_ring;
+};
+
+struct cm_peer {
+	struct list_head node;
+	struct rio_dev *rdev;
+};
+
+struct rio_cm_work {
+	struct work_struct work;
+	struct cm_dev *cm;
+	void *data;
+};
+
+struct conn_req {
+	struct list_head node;
+	u32 destid;	/* requester destID */
+	u16 chan;	/* requester channel ID */
+	struct cm_dev *cmdev;
+};
+
+/*
+ * A channel_dev structure represents a CM_CDEV
+ * @cdev	Character device
+ * @dev		Associated device object
+ */
+struct channel_dev {
+	struct cdev	cdev;
+	struct device	*dev;
+};
+
+static struct rio_channel *riocm_ch_alloc(u16 ch_num);
+static void riocm_ch_free(struct kref *ref);
+static int riocm_post_send(struct cm_dev *cm, struct rio_dev *rdev,
+			   void *buffer, size_t len);
+static int riocm_ch_close(struct rio_channel *ch);
+
+static DEFINE_SPINLOCK(idr_lock);
+static DEFINE_IDR(ch_idr);
+
+static LIST_HEAD(cm_dev_list);
+static DECLARE_RWSEM(rdev_sem);
+
+static struct class *dev_class;
+static unsigned int dev_major;
+static unsigned int dev_minor_base;
+static dev_t dev_number;
+static struct channel_dev riocm_cdev;
+
+#define is_msg_capable(src_ops, dst_ops)			\
+			((src_ops & RIO_SRC_OPS_DATA_MSG) &&	\
+			 (dst_ops & RIO_DST_OPS_DATA_MSG))
+#define dev_cm_capable(dev) \
+	is_msg_capable(dev->src_ops, dev->dst_ops)
+
+static int riocm_cmp(struct rio_channel *ch, enum rio_cm_state cmp)
+{
+	int ret;
+
+	spin_lock_bh(&ch->lock);
+	ret = (ch->state == cmp);
+	spin_unlock_bh(&ch->lock);
+	return ret;
+}
+
+static int riocm_cmp_exch(struct rio_channel *ch,
+			   enum rio_cm_state cmp, enum rio_cm_state exch)
+{
+	int ret;
+
+	spin_lock_bh(&ch->lock);
+	ret = (ch->state == cmp);
+	if (ret)
+		ch->state = exch;
+	spin_unlock_bh(&ch->lock);
+	return ret;
+}
+
+static enum rio_cm_state riocm_exch(struct rio_channel *ch,
+				    enum rio_cm_state exch)
+{
+	enum rio_cm_state old;
+
+	spin_lock_bh(&ch->lock);
+	old = ch->state;
+	ch->state = exch;
+	spin_unlock_bh(&ch->lock);
+	return old;
+}
+
+static struct rio_channel *riocm_get_channel(u16 nr)
+{
+	struct rio_channel *ch;
+
+	spin_lock_bh(&idr_lock);
+	ch = idr_find(&ch_idr, nr);
+	if (ch)
+		kref_get(&ch->ref);
+	spin_unlock_bh(&idr_lock);
+	return ch;
+}
+
+static void riocm_put_channel(struct rio_channel *ch)
+{
+	kref_put(&ch->ref, riocm_ch_free);
+}
+
+static void *riocm_rx_get_msg(struct cm_dev *cm)
+{
+	void *msg;
+	int i;
+
+	msg = rio_get_inb_message(cm->mport, cmbox);
+	if (msg) {
+		for (i = 0; i < RIOCM_RX_RING_SIZE; i++) {
+			if (cm->rx_buf[i] == msg) {
+				cm->rx_buf[i] = NULL;
+				cm->rx_slots++;
+				break;
+			}
+		}
+
+		if (i == RIOCM_RX_RING_SIZE)
+			riocm_warn("no record for buffer 0x%p", msg);
+	}
+
+	return msg;
+}
+
+/*
+ * riocm_rx_fill - fills a ring of receive buffers for given cm device
+ * @cm: cm_dev object
+ * @nent: max number of entries to fill
+ *
+ * Returns: none
+ */
+static void riocm_rx_fill(struct cm_dev *cm, int nent)
+{
+	int i;
+
+	if (cm->rx_slots == 0)
+		return;
+
+	for (i = 0; i < RIOCM_RX_RING_SIZE && cm->rx_slots && nent; i++) {
+		if (cm->rx_buf[i] == NULL) {
+			cm->rx_buf[i] = kmalloc(RIO_MAX_MSG_SIZE, GFP_KERNEL);
+			if (cm->rx_buf[i] == NULL)
+				break;
+			rio_add_inb_buffer(cm->mport, cmbox, cm->rx_buf[i]);
+			cm->rx_slots--;
+			nent--;
+		}
+	}
+}
+
+/*
+ * riocm_rx_free - frees all receive buffers associated with given cm device
+ * @cm: cm_dev object
+ *
+ * Returns: none
+ */
+static void riocm_rx_free(struct cm_dev *cm)
+{
+	int i;
+
+	for (i = 0; i < RIOCM_RX_RING_SIZE; i++) {
+		if (cm->rx_buf[i] != NULL) {
+			kfree(cm->rx_buf[i]);
+			cm->rx_buf[i] = NULL;
+		}
+	}
+}
+
+/*
+ * riocm_req_handler - connection request handler
+ * @cm: cm_dev object
+ * @req_data: pointer to the request packet
+ *
+ * Returns: 0 if success, or
+ *          -EINVAL if channel is not in correct state,
+ *          -ENODEV if cannot find a channel with specified ID,
+ *          -ENOMEM if unable to allocate memory to store the request
+ */
+static int riocm_req_handler(struct cm_dev *cm, void *req_data)
+{
+	struct rio_channel *ch;
+	struct conn_req *req;
+	struct rio_ch_chan_hdr *hh = req_data;
+	u16 chnum;
+
+	chnum = ntohs(hh->dst_ch);
+
+	ch = riocm_get_channel(chnum);
+
+	if (!ch)
+		return -ENODEV;
+
+	if (ch->state != RIO_CM_LISTEN) {
+		riocm_debug(RX_CMD, "channel %d is not in listen state", chnum);
+		riocm_put_channel(ch);
+		return -EINVAL;
+	}
+
+	req = kzalloc(sizeof(*req), GFP_KERNEL);
+	if (!req) {
+		riocm_put_channel(ch);
+		return -ENOMEM;
+	}
+
+	req->destid = ntohl(hh->bhdr.src_id);
+	req->chan = ntohs(hh->src_ch);
+	req->cmdev = cm;
+
+	spin_lock_bh(&ch->lock);
+	list_add_tail(&req->node, &ch->accept_queue);
+	spin_unlock_bh(&ch->lock);
+	complete(&ch->comp);
+	riocm_put_channel(ch);
+
+	return 0;
+}
+
+/*
+ * riocm_resp_handler - response to connection request handler
+ * @resp_data: pointer to the response packet
+ *
+ * Returns: 0 if success, or
+ *          -EINVAL if channel is not in correct state,
+ *          -ENODEV if cannot find a channel with specified ID,
+ */
+static int riocm_resp_handler(void *resp_data)
+{
+	struct rio_channel *ch;
+	struct rio_ch_chan_hdr *hh = resp_data;
+	u16 chnum;
+
+	chnum = ntohs(hh->dst_ch);
+	ch = riocm_get_channel(chnum);
+	if (!ch)
+		return -ENODEV;
+
+	if (ch->state != RIO_CM_CONNECT) {
+		riocm_put_channel(ch);
+		return -EINVAL;
+	}
+
+	riocm_exch(ch, RIO_CM_CONNECTED);
+	ch->rem_channel = ntohs(hh->src_ch);
+	complete(&ch->comp);
+	riocm_put_channel(ch);
+
+	return 0;
+}
+
+/*
+ * riocm_close_handler - channel close request handler
+ * @req_data: pointer to the request packet
+ *
+ * Returns: 0 if success, or
+ *          -ENODEV if cannot find a channel with specified ID,
+ *            + error codes returned by riocm_ch_close.
+ */
+static int riocm_close_handler(void *data)
+{
+	struct rio_channel *ch;
+	struct rio_ch_chan_hdr *hh = data;
+	int ret;
+
+	riocm_debug(RX_CMD, "for ch=%d", ntohs(hh->dst_ch));
+
+	spin_lock_bh(&idr_lock);
+	ch = idr_find(&ch_idr, ntohs(hh->dst_ch));
+	if (!ch) {
+		spin_unlock_bh(&idr_lock);
+		return -ENODEV;
+	}
+	idr_remove(&ch_idr, ch->id);
+	spin_unlock_bh(&idr_lock);
+
+	riocm_exch(ch, RIO_CM_DISCONNECT);
+
+	ret = riocm_ch_close(ch);
+	if (ret)
+		riocm_debug(RX_CMD, "riocm_ch_close() returned %d", ret);
+
+	return 0;
+}
+
+/*
+ * rio_cm_handler - function that services request (non-data) packets
+ * @cm: cm_dev object
+ * @data: pointer to the packet
+ */
+static void rio_cm_handler(struct cm_dev *cm, void *data)
+{
+	struct rio_ch_chan_hdr *hdr;
+
+	if (!rio_mport_is_running(cm->mport))
+		goto out;
+
+	hdr = data;
+
+	riocm_debug(RX_CMD, "OP=%x for ch=%d from %d",
+		    hdr->ch_op, ntohs(hdr->dst_ch), ntohs(hdr->src_ch));
+
+	switch (hdr->ch_op) {
+	case CM_CONN_REQ:
+		riocm_req_handler(cm, data);
+		break;
+	case CM_CONN_ACK:
+		riocm_resp_handler(data);
+		break;
+	case CM_CONN_CLOSE:
+		riocm_close_handler(data);
+		break;
+	default:
+		riocm_error("Invalid packet header");
+		break;
+	}
+out:
+	kfree(data);
+}
+
+/*
+ * rio_rx_data_handler - received data packet handler
+ * @cm: cm_dev object
+ * @buf: data packet
+ *
+ * Returns: 0 if success, or
+ *          -ENODEV if cannot find a channel with specified ID,
+ *          -EIO if channel is not in CONNECTED state,
+ *          -ENOMEM if channel RX queue is full (packet discarded)
+ */
+static int rio_rx_data_handler(struct cm_dev *cm, void *buf)
+{
+	struct rio_ch_chan_hdr *hdr;
+	struct rio_channel *ch;
+
+	hdr = buf;
+
+	riocm_debug(RX_DATA, "for ch=%d", ntohs(hdr->dst_ch));
+
+	ch = riocm_get_channel(ntohs(hdr->dst_ch));
+	if (!ch) {
+		/* Discard data message for non-existing channel */
+		kfree(buf);
+		return -ENODEV;
+	}
+
+	/* Place pointer to the buffer into channel's RX queue */
+	spin_lock(&ch->lock);
+
+	if (ch->state != RIO_CM_CONNECTED) {
+		/* Channel is not ready to receive data, discard a packet */
+		riocm_debug(RX_DATA, "ch=%d is in wrong state=%d",
+			    ch->id, ch->state);
+		spin_unlock(&ch->lock);
+		kfree(buf);
+		riocm_put_channel(ch);
+		return -EIO;
+	}
+
+	if (ch->rx_ring.count == RIOCM_RX_RING_SIZE) {
+		/* If RX ring is full, discard a packet */
+		riocm_debug(RX_DATA, "ch=%d is full", ch->id);
+		spin_unlock(&ch->lock);
+		kfree(buf);
+		riocm_put_channel(ch);
+		return -ENOMEM;
+	}
+
+	ch->rx_ring.buf[ch->rx_ring.head] = buf;
+	ch->rx_ring.head++;
+	ch->rx_ring.count++;
+	ch->rx_ring.head %= RIOCM_RX_RING_SIZE;
+
+	complete(&ch->comp);
+
+	spin_unlock(&ch->lock);
+	riocm_put_channel(ch);
+
+	return 0;
+}
+
+/*
+ * rio_ibmsg_handler - inbound message packet handler
+ */
+static void rio_ibmsg_handler(struct work_struct *work)
+{
+	struct cm_dev *cm = container_of(work, struct cm_dev, rx_work);
+	void *data;
+	struct rio_ch_chan_hdr *hdr;
+
+	if (!rio_mport_is_running(cm->mport))
+		return;
+
+	while (1) {
+		mutex_lock(&cm->rx_lock);
+		data = riocm_rx_get_msg(cm);
+		if (data)
+			riocm_rx_fill(cm, 1);
+		mutex_unlock(&cm->rx_lock);
+
+		if (data == NULL)
+			break;
+
+		hdr = data;
+
+		if (hdr->bhdr.type != RIO_CM_CHAN) {
+			/* For now simply discard packets other than channel */
+			riocm_error("Unsupported TYPE code (0x%x). Msg dropped",
+				    hdr->bhdr.type);
+			kfree(data);
+			continue;
+		}
+
+		/* Process a channel message */
+		if (hdr->ch_op == CM_DATA_MSG)
+			rio_rx_data_handler(cm, data);
+		else
+			rio_cm_handler(cm, data);
+	}
+}
+
+static void riocm_inb_msg_event(struct rio_mport *mport, void *dev_id,
+				int mbox, int slot)
+{
+	struct cm_dev *cm = dev_id;
+
+	if (rio_mport_is_running(cm->mport) && !work_pending(&cm->rx_work))
+		queue_work(cm->rx_wq, &cm->rx_work);
+}
+
+/*
+ * rio_txcq_handler - TX completion handler
+ * @cm: cm_dev object
+ * @slot: TX queue slot
+ *
+ * TX completion handler also ensures that pending request packets are placed
+ * into transmit queue as soon as a free slot becomes available. This is done
+ * to give higher priority to request packets during high intensity data flow.
+ */
+static void rio_txcq_handler(struct cm_dev *cm, int slot)
+{
+	int ack_slot;
+
+	/* ATTN: Add TX completion notification if/when direct buffer
+	 * transfer is implemented. At this moment only correct tracking
+	 * of tx_count is important.
+	 */
+	riocm_debug(TX_EVENT, "for mport_%d slot %d tx_cnt %d",
+		    cm->mport->id, slot, cm->tx_cnt);
+
+	spin_lock(&cm->tx_lock);
+	ack_slot = cm->tx_ack_slot;
+
+	if (ack_slot == slot)
+		riocm_debug(TX_EVENT, "slot == ack_slot");
+
+	while (cm->tx_cnt && ((ack_slot != slot) ||
+	       (cm->tx_cnt == RIOCM_TX_RING_SIZE))) {
+
+		cm->tx_buf[ack_slot] = NULL;
+		++ack_slot;
+		ack_slot &= (RIOCM_TX_RING_SIZE - 1);
+		cm->tx_cnt--;
+	}
+
+	if (cm->tx_cnt < 0 || cm->tx_cnt > RIOCM_TX_RING_SIZE)
+		riocm_error("tx_cnt %d out of sync", cm->tx_cnt);
+
+	WARN_ON((cm->tx_cnt < 0) || (cm->tx_cnt > RIOCM_TX_RING_SIZE));
+
+	cm->tx_ack_slot = ack_slot;
+
+	/*
+	 * If there are pending requests, insert them into transmit queue
+	 */
+	if (!list_empty(&cm->tx_reqs) && (cm->tx_cnt < RIOCM_TX_RING_SIZE)) {
+		struct tx_req *req, *_req;
+		int rc;
+
+		list_for_each_entry_safe(req, _req, &cm->tx_reqs, node) {
+			list_del(&req->node);
+			cm->tx_buf[cm->tx_slot] = req->buffer;
+			rc = rio_add_outb_message(cm->mport, req->rdev, cmbox,
+						  req->buffer, req->len);
+			kfree(req->buffer);
+			kfree(req);
+
+			++cm->tx_cnt;
+			++cm->tx_slot;
+			cm->tx_slot &= (RIOCM_TX_RING_SIZE - 1);
+			if (cm->tx_cnt == RIOCM_TX_RING_SIZE)
+				break;
+		}
+	}
+
+	spin_unlock(&cm->tx_lock);
+}
+
+static void riocm_outb_msg_event(struct rio_mport *mport, void *dev_id,
+				 int mbox, int slot)
+{
+	struct cm_dev *cm = dev_id;
+
+	if (cm && rio_mport_is_running(cm->mport))
+		rio_txcq_handler(cm, slot);
+}
+
+static int riocm_queue_req(struct cm_dev *cm, struct rio_dev *rdev,
+			   void *buffer, size_t len)
+{
+	unsigned long flags;
+	struct tx_req *treq;
+
+	treq = kzalloc(sizeof(*treq), GFP_KERNEL);
+	if (treq == NULL)
+		return -ENOMEM;
+
+	treq->rdev = rdev;
+	treq->buffer = buffer;
+	treq->len = len;
+
+	spin_lock_irqsave(&cm->tx_lock, flags);
+	list_add_tail(&treq->node, &cm->tx_reqs);
+	spin_unlock_irqrestore(&cm->tx_lock, flags);
+	return 0;
+}
+
+/*
+ * riocm_post_send - helper function that places packet into msg TX queue
+ * @cm: cm_dev object
+ * @rdev: target RapidIO device object (required by outbound msg interface)
+ * @buffer: pointer to a packet buffer to send
+ * @len: length of data to transfer
+ * @req: request priority flag
+ *
+ * Returns: 0 if success, or error code otherwise.
+ */
+static int riocm_post_send(struct cm_dev *cm, struct rio_dev *rdev,
+			   void *buffer, size_t len)
+{
+	int rc;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cm->tx_lock, flags);
+
+	if (cm->mport == NULL) {
+		rc = -ENODEV;
+		goto err_out;
+	}
+
+	if (cm->tx_cnt == RIOCM_TX_RING_SIZE) {
+		riocm_debug(TX, "Tx Queue is full");
+		rc = -EBUSY;
+		goto err_out;
+	}
+
+	cm->tx_buf[cm->tx_slot] = buffer;
+	rc = rio_add_outb_message(cm->mport, rdev, cmbox, buffer, len);
+
+	riocm_debug(TX, "Add buf@%p destid=%x tx_slot=%d tx_cnt=%d",
+		 buffer, rdev->destid, cm->tx_slot, cm->tx_cnt);
+
+	++cm->tx_cnt;
+	++cm->tx_slot;
+	cm->tx_slot &= (RIOCM_TX_RING_SIZE - 1);
+
+err_out:
+	spin_unlock_irqrestore(&cm->tx_lock, flags);
+	return rc;
+}
+
+/*
+ * riocm_ch_send - sends a data packet to a remote device
+ * @ch_id: local channel ID
+ * @buf: pointer to a data buffer to send (including CM header)
+ * @len: length of data to transfer (including CM header)
+ *
+ * ATTN: ASSUMES THAT THE HEADER SPACE IS RESERVED PART OF THE DATA PACKET
+ *
+ * Returns: 0 if success, or
+ *          -EINVAL if one or more input parameters is/are not valid,
+ *          -ENODEV if cannot find a channel with specified ID,
+ *          -EAGAIN if a channel is not in CONNECTED state,
+ *	    + error codes returned by HW send routine.
+ */
+static int riocm_ch_send(u16 ch_id, void *buf, int len)
+{
+	struct rio_channel *ch;
+	struct rio_ch_chan_hdr *hdr;
+	int ret;
+
+	if (buf == NULL || ch_id == 0 || len == 0 || len > RIO_MAX_MSG_SIZE)
+		return -EINVAL;
+
+	ch = riocm_get_channel(ch_id);
+	if (!ch) {
+		riocm_error("%s(%d) ch_%d not found", current->comm,
+			    task_pid_nr(current), ch_id);
+		return -ENODEV;
+	}
+
+	if (!riocm_cmp(ch, RIO_CM_CONNECTED)) {
+		ret = -EAGAIN;
+		goto err_out;
+	}
+
+	/*
+	 * Fill buffer header section with corresponding channel data
+	 */
+	hdr = buf;
+
+	hdr->bhdr.src_id = htonl(ch->loc_destid);
+	hdr->bhdr.dst_id = htonl(ch->rem_destid);
+	hdr->bhdr.src_mbox = cmbox;
+	hdr->bhdr.dst_mbox = cmbox;
+	hdr->bhdr.type = RIO_CM_CHAN;
+	hdr->ch_op = CM_DATA_MSG;
+	hdr->dst_ch = htons(ch->rem_channel);
+	hdr->src_ch = htons(ch->id);
+	hdr->msg_len = htons((u16)len);
+
+	/* ATTN: the function call below relies on the fact that underlying
+	 * HW-specific add_outb_message() routine copies TX data into its own
+	 * internal transfer buffer (true for all RIONET compatible mport
+	 * drivers). Must be reviewed if mport driver uses the buffer directly.
+	 */
+
+	ret = riocm_post_send(ch->cmdev, ch->rdev, buf, len);
+	if (ret)
+		riocm_debug(TX, "ch %d send_err=%d", ch->id, ret);
+err_out:
+	riocm_put_channel(ch);
+	return ret;
+}
+
+static int riocm_ch_free_rxbuf(struct rio_channel *ch, void *buf)
+{
+	int i, ret = -EINVAL;
+
+	spin_lock_bh(&ch->lock);
+
+	for (i = 0; i < RIOCM_RX_RING_SIZE; i++) {
+		if (ch->rx_ring.inuse[i] == buf) {
+			ch->rx_ring.inuse[i] = NULL;
+			ch->rx_ring.inuse_cnt--;
+			ret = 0;
+			break;
+		}
+	}
+
+	spin_unlock_bh(&ch->lock);
+
+	if (!ret)
+		kfree(buf);
+
+	return ret;
+}
+
+/*
+ * riocm_ch_receive - fetch a data packet received for the specified channel
+ * @ch: local channel ID
+ * @buf: pointer to a packet buffer
+ * @timeout: timeout to wait for incoming packet (in jiffies)
+ *
+ * Returns: 0 and valid buffer pointer if success, or NULL pointer and one of:
+ *          -EAGAIN if a channel is not in CONNECTED state,
+ *          -ENOMEM if in-use tracking queue is full,
+ *          -ETIME if wait timeout expired,
+ *	    -EINTR if wait was interrupted.
+ */
+static int riocm_ch_receive(struct rio_channel *ch, void **buf, long timeout)
+{
+	void *rxmsg = NULL;
+	int i, ret = 0;
+	long wret;
+
+	if (!riocm_cmp(ch, RIO_CM_CONNECTED)) {
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	if (ch->rx_ring.inuse_cnt == RIOCM_RX_RING_SIZE) {
+		/* If we do not have entries to track buffers given to upper
+		 * layer, reject request.
+		 */
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	wret = wait_for_completion_interruptible_timeout(&ch->comp, timeout);
+
+	riocm_debug(WAIT, "wait on %d returned %ld", ch->id, wret);
+
+	if (!wret)
+		ret = -ETIME;
+	else if (wret == -ERESTARTSYS)
+		ret = -EINTR;
+	else
+		ret = riocm_cmp(ch, RIO_CM_CONNECTED) ? 0 : -ECONNRESET;
+
+	if (ret)
+		goto out;
+
+	spin_lock_bh(&ch->lock);
+
+	rxmsg = ch->rx_ring.buf[ch->rx_ring.tail];
+	ch->rx_ring.buf[ch->rx_ring.tail] = NULL;
+	ch->rx_ring.count--;
+	ch->rx_ring.tail++;
+	ch->rx_ring.tail %= RIOCM_RX_RING_SIZE;
+	ret = -ENOMEM;
+
+	for (i = 0; i < RIOCM_RX_RING_SIZE; i++) {
+		if (ch->rx_ring.inuse[i] == NULL) {
+			ch->rx_ring.inuse[i] = rxmsg;
+			ch->rx_ring.inuse_cnt++;
+			ret = 0;
+			break;
+		}
+	}
+
+	if (ret) {
+		/* We have no entry to store pending message: drop it */
+		kfree(rxmsg);
+		rxmsg = NULL;
+	}
+
+	spin_unlock_bh(&ch->lock);
+out:
+	*buf = rxmsg;
+	return ret;
+}
+
+/*
+ * riocm_ch_connect - sends a connect request to a remote device
+ * @loc_ch: local channel ID
+ * @cm: CM device to send connect request
+ * @peer: target RapidIO device
+ * @rem_ch: remote channel ID
+ *
+ * Returns: 0 if success, or
+ *          -EINVAL if the channel is not in IDLE state,
+ *          -EAGAIN if no connection request available immediately,
+ *          -ETIME if ACK response timeout expired,
+ *          -EINTR if wait for response was interrupted.
+ */
+static int riocm_ch_connect(u16 loc_ch, struct cm_dev *cm,
+			    struct cm_peer *peer, u16 rem_ch)
+{
+	struct rio_channel *ch = NULL;
+	struct rio_ch_chan_hdr *hdr;
+	int ret;
+	long wret;
+
+	ch = riocm_get_channel(loc_ch);
+	if (!ch)
+		return -ENODEV;
+
+	if (!riocm_cmp_exch(ch, RIO_CM_IDLE, RIO_CM_CONNECT)) {
+		ret = -EINVAL;
+		goto conn_done;
+	}
+
+	ch->cmdev = cm;
+	ch->rdev = peer->rdev;
+	ch->context = NULL;
+	ch->loc_destid = cm->mport->host_deviceid;
+	ch->rem_channel = rem_ch;
+
+	/*
+	 * Send connect request to the remote RapidIO device
+	 */
+
+	hdr = kzalloc(sizeof(*hdr), GFP_KERNEL);
+	if (hdr == NULL) {
+		ret = -ENOMEM;
+		goto conn_done;
+	}
+
+	hdr->bhdr.src_id = htonl(ch->loc_destid);
+	hdr->bhdr.dst_id = htonl(peer->rdev->destid);
+	hdr->bhdr.src_mbox = cmbox;
+	hdr->bhdr.dst_mbox = cmbox;
+	hdr->bhdr.type = RIO_CM_CHAN;
+	hdr->ch_op = CM_CONN_REQ;
+	hdr->dst_ch = htons(rem_ch);
+	hdr->src_ch = htons(loc_ch);
+
+	/* ATTN: the function call below relies on the fact that underlying
+	 * HW-specific add_outb_message() routine copies TX data into its
+	 * internal transfer buffer. Must be reviewed if mport driver uses
+	 * this buffer directly.
+	 */
+	ret = riocm_post_send(cm, peer->rdev, hdr, sizeof(*hdr));
+
+	if (ret != -EBUSY) {
+		kfree(hdr);
+	} else {
+		ret = riocm_queue_req(cm, peer->rdev, hdr, sizeof(*hdr));
+		if (ret)
+			kfree(hdr);
+	}
+
+	if (ret) {
+		riocm_cmp_exch(ch, RIO_CM_CONNECT, RIO_CM_IDLE);
+		goto conn_done;
+	}
+
+	/* Wait for connect response from the remote device */
+	wret = wait_for_completion_interruptible_timeout(&ch->comp,
+							 RIOCM_CONNECT_TO * HZ);
+	riocm_debug(WAIT, "wait on %d returns %ld", ch->id, wret);
+
+	if (!wret)
+		ret = -ETIME;
+	else if (wret == -ERESTARTSYS)
+		ret = -EINTR;
+	else
+		ret = riocm_cmp(ch, RIO_CM_CONNECTED) ? 0 : -1;
+
+conn_done:
+	riocm_put_channel(ch);
+	return ret;
+}
+
+static int riocm_send_ack(struct rio_channel *ch)
+{
+	struct rio_ch_chan_hdr *hdr;
+	int ret;
+
+	hdr = kzalloc(sizeof(*hdr), GFP_KERNEL);
+	if (hdr == NULL)
+		return -ENOMEM;
+
+	hdr->bhdr.src_id = htonl(ch->loc_destid);
+	hdr->bhdr.dst_id = htonl(ch->rem_destid);
+	hdr->dst_ch = htons(ch->rem_channel);
+	hdr->src_ch = htons(ch->id);
+	hdr->bhdr.src_mbox = cmbox;
+	hdr->bhdr.dst_mbox = cmbox;
+	hdr->bhdr.type = RIO_CM_CHAN;
+	hdr->ch_op = CM_CONN_ACK;
+
+	/* ATTN: the function call below relies on the fact that underlying
+	 * add_outb_message() routine copies TX data into its internal transfer
+	 * buffer. Review if switching to direct buffer version.
+	 */
+	ret = riocm_post_send(ch->cmdev, ch->rdev, hdr, sizeof(*hdr));
+
+	if (ret == -EBUSY && !riocm_queue_req(ch->cmdev,
+					      ch->rdev, hdr, sizeof(*hdr)))
+		return 0;
+	kfree(hdr);
+
+	if (ret)
+		riocm_error("send ACK to ch_%d on %s failed (ret=%d)",
+			    ch->id, rio_name(ch->rdev), ret);
+	return ret;
+}
+
+/*
+ * riocm_ch_accept - accept incoming connection request
+ * @ch_id: channel ID
+ * @new_ch_id: local mport device
+ * @timeout: wait timeout (if 0 non-blocking call, do not wait if connection
+ *           request is not available).
+ *
+ * Returns: pointer to new channel struct if success, or error-valued pointer:
+ *          -ENODEV - cannot find specified channel or mport,
+ *          -EINVAL - the channel is not in IDLE state,
+ *          -EAGAIN - no connection request available immediately (timeout=0),
+ *          -ENOMEM - unable to allocate new channel,
+ *          -ETIME - wait timeout expired,
+ *          -EINTR - wait was interrupted.
+ */
+static struct rio_channel *riocm_ch_accept(u16 ch_id, u16 *new_ch_id,
+					   long timeout)
+{
+	struct rio_channel *ch = NULL;
+	struct rio_channel *new_ch = NULL;
+	struct conn_req *req;
+	struct cm_peer *peer;
+	int found = 0;
+	int err = 0;
+	long wret;
+
+	ch = riocm_get_channel(ch_id);
+	if (!ch)
+		return ERR_PTR(-EINVAL);
+
+	if (!riocm_cmp(ch, RIO_CM_LISTEN)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	/* Don't sleep if this is a non blocking call */
+	if (!timeout) {
+		if (!try_wait_for_completion(&ch->comp)) {
+			err = -EAGAIN;
+			goto err_put;
+		}
+	} else {
+		riocm_debug(WAIT, "on %d", ch->id);
+
+		wret = wait_for_completion_interruptible_timeout(&ch->comp,
+								 timeout);
+		if (!wret) {
+			err = -ETIME;
+			goto err_put;
+		} else if (wret == -ERESTARTSYS) {
+			err = -EINTR;
+			goto err_put;
+		}
+	}
+
+	spin_lock_bh(&ch->lock);
+
+	if (ch->state != RIO_CM_LISTEN) {
+		err = -ECANCELED;
+	} else if (list_empty(&ch->accept_queue)) {
+		riocm_debug(WAIT, "on %d accept_queue is empty on completion",
+			    ch->id);
+		err = -EIO;
+	}
+
+	spin_unlock_bh(&ch->lock);
+
+	if (err) {
+		riocm_debug(WAIT, "on %d returns %d", ch->id, err);
+		goto err_put;
+	}
+
+	/* Create new channel for this connection */
+	new_ch = riocm_ch_alloc(RIOCM_CHNUM_AUTO);
+
+	if (IS_ERR(new_ch)) {
+		riocm_error("failed to get channel for new req (%ld)",
+			PTR_ERR(new_ch));
+		err = -ENOMEM;
+		goto err_put;
+	}
+
+	spin_lock_bh(&ch->lock);
+
+	req = list_first_entry(&ch->accept_queue, struct conn_req, node);
+	list_del(&req->node);
+	new_ch->cmdev = ch->cmdev;
+	new_ch->loc_destid = ch->loc_destid;
+	new_ch->rem_destid = req->destid;
+	new_ch->rem_channel = req->chan;
+
+	spin_unlock_bh(&ch->lock);
+	riocm_put_channel(ch);
+	kfree(req);
+
+	down_read(&rdev_sem);
+	/* Find requester's device object */
+	list_for_each_entry(peer, &new_ch->cmdev->peers, node) {
+		if (peer->rdev->destid == new_ch->rem_destid) {
+			riocm_debug(RX_CMD, "found matching device(%s)",
+				    rio_name(peer->rdev));
+			found = 1;
+			break;
+		}
+	}
+	up_read(&rdev_sem);
+
+	if (!found) {
+		/* If peer device object not found, simply ignore the request */
+		err = -ENODEV;
+		goto err_nodev;
+	}
+
+	new_ch->rdev = peer->rdev;
+	new_ch->state = RIO_CM_CONNECTED;
+	spin_lock_init(&new_ch->lock);
+
+	/* Acknowledge the connection request. */
+	riocm_send_ack(new_ch);
+
+	*new_ch_id = new_ch->id;
+	return new_ch;
+err_put:
+	riocm_put_channel(ch);
+err_nodev:
+	if (new_ch) {
+		spin_lock_bh(&idr_lock);
+		idr_remove(&ch_idr, new_ch->id);
+		spin_unlock_bh(&idr_lock);
+		riocm_put_channel(new_ch);
+	}
+	*new_ch_id = 0;
+	return ERR_PTR(err);
+}
+
+/*
+ * riocm_ch_listen - puts a channel into LISTEN state
+ * @ch_id: channel ID
+ *
+ * Returns: 0 if success, or
+ *          -EINVAL if the specified channel does not exists or
+ *                  is not in CHAN_BOUND state.
+ */
+static int riocm_ch_listen(u16 ch_id)
+{
+	struct rio_channel *ch = NULL;
+	int ret = 0;
+
+	riocm_debug(CHOP, "(ch_%d)", ch_id);
+
+	ch = riocm_get_channel(ch_id);
+	if (!ch || !riocm_cmp_exch(ch, RIO_CM_CHAN_BOUND, RIO_CM_LISTEN))
+		ret = -EINVAL;
+	riocm_put_channel(ch);
+	return ret;
+}
+
+/*
+ * riocm_ch_bind - associate a channel object and an mport device
+ * @ch_id: channel ID
+ * @mport_id: local mport device ID
+ * @context: pointer to the additional caller's context
+ *
+ * Returns: 0 if success, or
+ *          -ENODEV if cannot find specified mport,
+ *          -EINVAL if the specified channel does not exist or
+ *                  is not in IDLE state.
+ */
+static int riocm_ch_bind(u16 ch_id, u8 mport_id, void *context)
+{
+	struct rio_channel *ch = NULL;
+	struct cm_dev *cm;
+	int rc = -ENODEV;
+
+	riocm_debug(CHOP, "ch_%d to mport_%d", ch_id, mport_id);
+
+	/* Find matching cm_dev object */
+	down_read(&rdev_sem);
+	list_for_each_entry(cm, &cm_dev_list, list) {
+		if ((cm->mport->id == mport_id) &&
+		     rio_mport_is_running(cm->mport)) {
+			rc = 0;
+			break;
+		}
+	}
+
+	if (rc)
+		goto exit;
+
+	ch = riocm_get_channel(ch_id);
+	if (!ch) {
+		rc = -EINVAL;
+		goto exit;
+	}
+
+	spin_lock_bh(&ch->lock);
+	if (ch->state != RIO_CM_IDLE) {
+		spin_unlock_bh(&ch->lock);
+		rc = -EINVAL;
+		goto err_put;
+	}
+
+	ch->cmdev = cm;
+	ch->loc_destid = cm->mport->host_deviceid;
+	ch->context = context;
+	ch->state = RIO_CM_CHAN_BOUND;
+	spin_unlock_bh(&ch->lock);
+err_put:
+	riocm_put_channel(ch);
+exit:
+	up_read(&rdev_sem);
+	return rc;
+}
+
+/*
+ * riocm_ch_alloc - channel object allocation helper routine
+ * @ch_num: channel ID (1 ... RIOCM_MAX_CHNUM, 0 = automatic)
+ *
+ * Return value: pointer to newly created channel object,
+ *               or error-valued pointer
+ */
+static struct rio_channel *riocm_ch_alloc(u16 ch_num)
+{
+	int id;
+	int start, end;
+	struct rio_channel *ch;
+
+	ch = kzalloc(sizeof(*ch), GFP_KERNEL);
+	if (!ch)
+		return ERR_PTR(-ENOMEM);
+
+	if (ch_num) {
+		/* If requested, try to obtain the specified channel ID */
+		start = ch_num;
+		end = ch_num + 1;
+	} else {
+		/* Obtain channel ID from the dynamic allocation range */
+		start = chstart;
+		end = RIOCM_MAX_CHNUM + 1;
+	}
+
+	idr_preload(GFP_KERNEL);
+	spin_lock_bh(&idr_lock);
+	id = idr_alloc_cyclic(&ch_idr, ch, start, end, GFP_NOWAIT);
+	spin_unlock_bh(&idr_lock);
+	idr_preload_end();
+
+	if (id < 0) {
+		kfree(ch);
+		return ERR_PTR(id == -ENOSPC ? -EBUSY : id);
+	}
+
+	ch->id = (u16)id;
+	ch->state = RIO_CM_IDLE;
+	spin_lock_init(&ch->lock);
+	INIT_LIST_HEAD(&ch->accept_queue);
+	INIT_LIST_HEAD(&ch->ch_node);
+	init_completion(&ch->comp);
+	init_completion(&ch->comp_close);
+	kref_init(&ch->ref);
+	ch->rx_ring.head = 0;
+	ch->rx_ring.tail = 0;
+	ch->rx_ring.count = 0;
+	ch->rx_ring.inuse_cnt = 0;
+
+	return ch;
+}
+
+/*
+ * riocm_ch_create - creates a new channel object and allocates ID for it
+ * @ch_num: channel ID (1 ... RIOCM_MAX_CHNUM, 0 = automatic)
+ *
+ * Allocates and initializes a new channel object. If the parameter ch_num > 0
+ * and is within the valid range, riocm_ch_create tries to allocate the
+ * specified ID for the new channel. If ch_num = 0, channel ID will be assigned
+ * automatically from the range (chstart ... RIOCM_MAX_CHNUM).
+ * Module parameter 'chstart' defines start of an ID range available for dynamic
+ * allocation. Range below 'chstart' is reserved for pre-defined ID numbers.
+ * Available channel numbers are limited by 16-bit size of channel numbers used
+ * in the packet header.
+ *
+ * Return value: PTR to rio_channel structure if successful (with channel number
+ *               updated via pointer) or error-valued pointer if error.
+ */
+static struct rio_channel *riocm_ch_create(u16 *ch_num)
+{
+	struct rio_channel *ch = NULL;
+
+	ch = riocm_ch_alloc(*ch_num);
+
+	if (IS_ERR(ch))
+		riocm_debug(CHOP, "Failed to allocate channel %d (err=%ld)",
+			    *ch_num, PTR_ERR(ch));
+	else
+		*ch_num = ch->id;
+
+	return ch;
+}
+
+/*
+ * riocm_ch_free - channel object release routine
+ * @ref: pointer to a channel's kref structure
+ */
+static void riocm_ch_free(struct kref *ref)
+{
+	struct rio_channel *ch = container_of(ref, struct rio_channel, ref);
+	int i;
+
+	riocm_debug(CHOP, "(ch_%d)", ch->id);
+
+	if (ch->rx_ring.inuse_cnt) {
+		for (i = 0;
+		     i < RIOCM_RX_RING_SIZE && ch->rx_ring.inuse_cnt; i++) {
+			if (ch->rx_ring.inuse[i] != NULL) {
+				kfree(ch->rx_ring.inuse[i]);
+				ch->rx_ring.inuse_cnt--;
+			}
+		}
+	}
+
+	if (ch->rx_ring.count)
+		for (i = 0; i < RIOCM_RX_RING_SIZE && ch->rx_ring.count; i++) {
+			if (ch->rx_ring.buf[i] != NULL) {
+				kfree(ch->rx_ring.buf[i]);
+				ch->rx_ring.count--;
+			}
+		}
+
+	complete(&ch->comp_close);
+}
+
+static int riocm_send_close(struct rio_channel *ch)
+{
+	struct rio_ch_chan_hdr *hdr;
+	int ret;
+
+	/*
+	 * Send CH_CLOSE notification to the remote RapidIO device
+	 */
+
+	hdr = kzalloc(sizeof(*hdr), GFP_KERNEL);
+	if (hdr == NULL)
+		return -ENOMEM;
+
+	hdr->bhdr.src_id = htonl(ch->loc_destid);
+	hdr->bhdr.dst_id = htonl(ch->rem_destid);
+	hdr->bhdr.src_mbox = cmbox;
+	hdr->bhdr.dst_mbox = cmbox;
+	hdr->bhdr.type = RIO_CM_CHAN;
+	hdr->ch_op = CM_CONN_CLOSE;
+	hdr->dst_ch = htons(ch->rem_channel);
+	hdr->src_ch = htons(ch->id);
+
+	/* ATTN: the function call below relies on the fact that underlying
+	 * add_outb_message() routine copies TX data into its internal transfer
+	 * buffer. Needs to be reviewed if switched to direct buffer mode.
+	 */
+	ret = riocm_post_send(ch->cmdev, ch->rdev, hdr, sizeof(*hdr));
+
+	if (ret == -EBUSY && !riocm_queue_req(ch->cmdev, ch->rdev,
+					      hdr, sizeof(*hdr)))
+		return 0;
+	kfree(hdr);
+
+	if (ret)
+		riocm_error("ch(%d) send CLOSE failed (ret=%d)", ch->id, ret);
+
+	return ret;
+}
+
+/*
+ * riocm_ch_close - closes a channel object with specified ID (by local request)
+ * @ch: channel to be closed
+ */
+static int riocm_ch_close(struct rio_channel *ch)
+{
+	unsigned long tmo = msecs_to_jiffies(3000);
+	enum rio_cm_state state;
+	long wret;
+	int ret = 0;
+
+	riocm_debug(CHOP, "ch_%d by %s(%d)",
+		    ch->id, current->comm, task_pid_nr(current));
+
+	state = riocm_exch(ch, RIO_CM_DESTROYING);
+	if (state == RIO_CM_CONNECTED)
+		riocm_send_close(ch);
+
+	complete_all(&ch->comp);
+
+	riocm_put_channel(ch);
+	wret = wait_for_completion_interruptible_timeout(&ch->comp_close, tmo);
+
+	riocm_debug(WAIT, "wait on %d returns %ld", ch->id, wret);
+
+	if (wret == 0) {
+		/* Timeout on wait occurred */
+		riocm_debug(CHOP, "%s(%d) timed out waiting for ch %d",
+		       current->comm, task_pid_nr(current), ch->id);
+		ret = -ETIMEDOUT;
+	} else if (wret == -ERESTARTSYS) {
+		/* Wait_for_completion was interrupted by a signal */
+		riocm_debug(CHOP, "%s(%d) wait for ch %d was interrupted",
+			current->comm, task_pid_nr(current), ch->id);
+		ret = -EINTR;
+	}
+
+	if (!ret) {
+		riocm_debug(CHOP, "ch_%d resources released", ch->id);
+		kfree(ch);
+	} else {
+		riocm_debug(CHOP, "failed to release ch_%d resources", ch->id);
+	}
+
+	return ret;
+}
+
+/*
+ * riocm_cdev_open() - Open character device
+ */
+static int riocm_cdev_open(struct inode *inode, struct file *filp)
+{
+	riocm_debug(INIT, "by %s(%d) filp=%p ",
+		    current->comm, task_pid_nr(current), filp);
+
+	if (list_empty(&cm_dev_list))
+		return -ENODEV;
+
+	return 0;
+}
+
+/*
+ * riocm_cdev_release() - Release character device
+ */
+static int riocm_cdev_release(struct inode *inode, struct file *filp)
+{
+	struct rio_channel *ch, *_c;
+	unsigned int i;
+	LIST_HEAD(list);
+
+	riocm_debug(EXIT, "by %s(%d) filp=%p",
+		    current->comm, task_pid_nr(current), filp);
+
+	/* Check if there are channels associated with this file descriptor */
+	spin_lock_bh(&idr_lock);
+	idr_for_each_entry(&ch_idr, ch, i) {
+		if (ch && ch->filp == filp) {
+			riocm_debug(EXIT, "ch_%d not released by %s(%d)",
+				    ch->id, current->comm,
+				    task_pid_nr(current));
+			idr_remove(&ch_idr, ch->id);
+			list_add(&ch->ch_node, &list);
+		}
+	}
+	spin_unlock_bh(&idr_lock);
+
+	if (!list_empty(&list)) {
+		list_for_each_entry_safe(ch, _c, &list, ch_node) {
+			list_del(&ch->ch_node);
+			riocm_ch_close(ch);
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * cm_ep_get_list_size() - Reports number of endpoints in the network
+ */
+static int cm_ep_get_list_size(void __user *arg)
+{
+	u32 __user *p = arg;
+	u32 mport_id;
+	u32 count = 0;
+	struct cm_dev *cm;
+
+	if (get_user(mport_id, p))
+		return -EFAULT;
+	if (mport_id >= RIO_MAX_MPORTS)
+		return -EINVAL;
+
+	/* Find a matching cm_dev object */
+	down_read(&rdev_sem);
+	list_for_each_entry(cm, &cm_dev_list, list) {
+		if (cm->mport->id == mport_id) {
+			count = cm->npeers;
+			up_read(&rdev_sem);
+			if (copy_to_user(arg, &count, sizeof(u32)))
+				return -EFAULT;
+			return 0;
+		}
+	}
+	up_read(&rdev_sem);
+
+	return -ENODEV;
+}
+
+/*
+ * cm_ep_get_list() - Returns list of attached endpoints
+ */
+static int cm_ep_get_list(void __user *arg)
+{
+	struct cm_dev *cm;
+	struct cm_peer *peer;
+	u32 info[2];
+	void *buf;
+	u32 nent;
+	u32 *entry_ptr;
+	u32 i = 0;
+	int ret = 0;
+
+	if (copy_from_user(&info, arg, sizeof(info)))
+		return -EFAULT;
+
+	if (info[1] >= RIO_MAX_MPORTS || info[0] > RIOCM_MAX_EP_COUNT)
+		return -EINVAL;
+
+	/* Find a matching cm_dev object */
+	down_read(&rdev_sem);
+	list_for_each_entry(cm, &cm_dev_list, list)
+		if (cm->mport->id == (u8)info[1])
+			goto found;
+
+	up_read(&rdev_sem);
+	return -ENODEV;
+
+found:
+	nent = min(info[0], cm->npeers);
+	buf = kcalloc(nent + 2, sizeof(u32), GFP_KERNEL);
+	if (!buf) {
+		up_read(&rdev_sem);
+		return -ENOMEM;
+	}
+
+	entry_ptr = (u32 *)((uintptr_t)buf + 2*sizeof(u32));
+
+	list_for_each_entry(peer, &cm->peers, node) {
+		*entry_ptr = (u32)peer->rdev->destid;
+		entry_ptr++;
+		if (++i == nent)
+			break;
+	}
+	up_read(&rdev_sem);
+
+	((u32 *)buf)[0] = i; /* report an updated number of entries */
+	((u32 *)buf)[1] = info[1]; /* put back an mport ID */
+	if (copy_to_user(arg, buf, sizeof(u32) * (info[0] + 2)))
+		ret = -EFAULT;
+
+	kfree(buf);
+	return ret;
+}
+
+/*
+ * cm_mport_get_list() - Returns list of available local mport devices
+ */
+static int cm_mport_get_list(void __user *arg)
+{
+	int ret = 0;
+	u32 entries;
+	void *buf;
+	struct cm_dev *cm;
+	u32 *entry_ptr;
+	int count = 0;
+
+	if (copy_from_user(&entries, arg, sizeof(entries)))
+		return -EFAULT;
+	if (entries == 0 || entries > RIO_MAX_MPORTS)
+		return -EINVAL;
+	buf = kcalloc(entries + 1, sizeof(u32), GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	/* Scan all registered cm_dev objects */
+	entry_ptr = (u32 *)((uintptr_t)buf + sizeof(u32));
+	down_read(&rdev_sem);
+	list_for_each_entry(cm, &cm_dev_list, list) {
+		if (count++ < entries) {
+			*entry_ptr = (cm->mport->id << 16) |
+				      cm->mport->host_deviceid;
+			entry_ptr++;
+		}
+	}
+	up_read(&rdev_sem);
+
+	*((u32 *)buf) = count; /* report a real number of entries */
+	if (copy_to_user(arg, buf, sizeof(u32) * (count + 1)))
+		ret = -EFAULT;
+
+	kfree(buf);
+	return ret;
+}
+
+/*
+ * cm_chan_create() - Create a message exchange channel
+ */
+static int cm_chan_create(struct file *filp, void __user *arg)
+{
+	u16 __user *p = arg;
+	u16 ch_num;
+	struct rio_channel *ch;
+
+	if (get_user(ch_num, p))
+		return -EFAULT;
+
+	riocm_debug(CHOP, "ch_%d requested by %s(%d)",
+		    ch_num, current->comm, task_pid_nr(current));
+	ch = riocm_ch_create(&ch_num);
+	if (IS_ERR(ch))
+		return PTR_ERR(ch);
+
+	ch->filp = filp;
+	riocm_debug(CHOP, "ch_%d created by %s(%d)",
+		    ch_num, current->comm, task_pid_nr(current));
+	return put_user(ch_num, p);
+}
+
+/*
+ * cm_chan_close() - Close channel
+ * @filp:	Pointer to file object
+ * @arg:	Channel to close
+ */
+static int cm_chan_close(struct file *filp, void __user *arg)
+{
+	u16 __user *p = arg;
+	u16 ch_num;
+	struct rio_channel *ch;
+
+	if (get_user(ch_num, p))
+		return -EFAULT;
+
+	riocm_debug(CHOP, "ch_%d by %s(%d)",
+		    ch_num, current->comm, task_pid_nr(current));
+
+	spin_lock_bh(&idr_lock);
+	ch = idr_find(&ch_idr, ch_num);
+	if (!ch) {
+		spin_unlock_bh(&idr_lock);
+		return 0;
+	}
+	if (ch->filp != filp) {
+		spin_unlock_bh(&idr_lock);
+		return -EINVAL;
+	}
+	idr_remove(&ch_idr, ch->id);
+	spin_unlock_bh(&idr_lock);
+
+	return riocm_ch_close(ch);
+}
+
+/*
+ * cm_chan_bind() - Bind channel
+ * @arg:	Channel number
+ */
+static int cm_chan_bind(void __user *arg)
+{
+	struct rio_cm_channel chan;
+
+	if (copy_from_user(&chan, arg, sizeof(chan)))
+		return -EFAULT;
+	if (chan.mport_id >= RIO_MAX_MPORTS)
+		return -EINVAL;
+
+	return riocm_ch_bind(chan.id, chan.mport_id, NULL);
+}
+
+/*
+ * cm_chan_listen() - Listen on channel
+ * @arg:	Channel number
+ */
+static int cm_chan_listen(void __user *arg)
+{
+	u16 __user *p = arg;
+	u16 ch_num;
+
+	if (get_user(ch_num, p))
+		return -EFAULT;
+
+	return riocm_ch_listen(ch_num);
+}
+
+/*
+ * cm_chan_accept() - Accept incoming connection
+ * @filp:	Pointer to file object
+ * @arg:	Channel number
+ */
+static int cm_chan_accept(struct file *filp, void __user *arg)
+{
+	struct rio_cm_accept param;
+	long accept_to;
+	struct rio_channel *ch;
+
+	if (copy_from_user(&param, arg, sizeof(param)))
+		return -EFAULT;
+
+	riocm_debug(CHOP, "on ch_%d by %s(%d)",
+		    param.ch_num, current->comm, task_pid_nr(current));
+
+	accept_to = param.wait_to ?
+			msecs_to_jiffies(param.wait_to) : 0;
+
+	ch = riocm_ch_accept(param.ch_num, &param.ch_num, accept_to);
+	if (IS_ERR(ch))
+		return PTR_ERR(ch);
+	ch->filp = filp;
+
+	riocm_debug(CHOP, "new ch_%d for %s(%d)",
+		    ch->id, current->comm, task_pid_nr(current));
+
+	if (copy_to_user(arg, &param, sizeof(param)))
+		return -EFAULT;
+	return 0;
+}
+
+/*
+ * cm_chan_connect() - Connect on channel
+ * @arg:	Channel information
+ */
+static int cm_chan_connect(void __user *arg)
+{
+	struct rio_cm_channel chan;
+	struct cm_dev *cm;
+	struct cm_peer *peer;
+	int ret = -ENODEV;
+
+	if (copy_from_user(&chan, arg, sizeof(chan)))
+		return -EFAULT;
+	if (chan.mport_id >= RIO_MAX_MPORTS)
+		return -EINVAL;
+
+	down_read(&rdev_sem);
+
+	/* Find matching cm_dev object */
+	list_for_each_entry(cm, &cm_dev_list, list) {
+		if (cm->mport->id == chan.mport_id) {
+			ret = 0;
+			break;
+		}
+	}
+
+	if (ret)
+		goto err_out;
+
+	if (chan.remote_destid >= RIO_ANY_DESTID(cm->mport->sys_size)) {
+		ret = -EINVAL;
+		goto err_out;
+	}
+
+	/* Find corresponding RapidIO endpoint device object */
+	ret = -ENODEV;
+
+	list_for_each_entry(peer, &cm->peers, node) {
+		if (peer->rdev->destid == chan.remote_destid) {
+			ret = 0;
+			break;
+		}
+	}
+
+	if (ret)
+		goto err_out;
+
+	up_read(&rdev_sem);
+
+	return riocm_ch_connect(chan.id, cm, peer, chan.remote_channel);
+err_out:
+	up_read(&rdev_sem);
+	return ret;
+}
+
+/*
+ * cm_chan_msg_send() - Send a message through channel
+ * @arg:	Outbound message information
+ */
+static int cm_chan_msg_send(void __user *arg)
+{
+	struct rio_cm_msg msg;
+	void *buf;
+	int ret = 0;
+
+	if (copy_from_user(&msg, arg, sizeof(msg)))
+		return -EFAULT;
+	if (msg.size > RIO_MAX_MSG_SIZE)
+		return -EINVAL;
+
+	buf = kmalloc(msg.size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	if (copy_from_user(buf, (void __user *)(uintptr_t)msg.msg, msg.size)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = riocm_ch_send(msg.ch_num, buf, msg.size);
+out:
+	kfree(buf);
+	return ret;
+}
+
+/*
+ * cm_chan_msg_rcv() - Receive a message through channel
+ * @arg:	Inbound message information
+ */
+static int cm_chan_msg_rcv(void __user *arg)
+{
+	struct rio_cm_msg msg;
+	struct rio_channel *ch;
+	void *buf;
+	long rxto;
+	int ret = 0, msg_size;
+
+	if (copy_from_user(&msg, arg, sizeof(msg)))
+		return -EFAULT;
+
+	if (msg.ch_num == 0 || msg.size == 0)
+		return -EINVAL;
+
+	ch = riocm_get_channel(msg.ch_num);
+	if (!ch)
+		return -ENODEV;
+
+	rxto = msg.rxto ? msecs_to_jiffies(msg.rxto) : MAX_SCHEDULE_TIMEOUT;
+
+	ret = riocm_ch_receive(ch, &buf, rxto);
+	if (ret)
+		goto out;
+
+	msg_size = min(msg.size, (u16)(RIO_MAX_MSG_SIZE));
+
+	if (copy_to_user((void __user *)(uintptr_t)msg.msg, buf, msg_size))
+		ret = -EFAULT;
+
+	riocm_ch_free_rxbuf(ch, buf);
+out:
+	riocm_put_channel(ch);
+	return ret;
+}
+
+/*
+ * riocm_cdev_ioctl() - IOCTL requests handler
+ */
+static long
+riocm_cdev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	switch (cmd) {
+	case RIO_CM_EP_GET_LIST_SIZE:
+		return cm_ep_get_list_size((void __user *)arg);
+	case RIO_CM_EP_GET_LIST:
+		return cm_ep_get_list((void __user *)arg);
+	case RIO_CM_CHAN_CREATE:
+		return cm_chan_create(filp, (void __user *)arg);
+	case RIO_CM_CHAN_CLOSE:
+		return cm_chan_close(filp, (void __user *)arg);
+	case RIO_CM_CHAN_BIND:
+		return cm_chan_bind((void __user *)arg);
+	case RIO_CM_CHAN_LISTEN:
+		return cm_chan_listen((void __user *)arg);
+	case RIO_CM_CHAN_ACCEPT:
+		return cm_chan_accept(filp, (void __user *)arg);
+	case RIO_CM_CHAN_CONNECT:
+		return cm_chan_connect((void __user *)arg);
+	case RIO_CM_CHAN_SEND:
+		return cm_chan_msg_send((void __user *)arg);
+	case RIO_CM_CHAN_RECEIVE:
+		return cm_chan_msg_rcv((void __user *)arg);
+	case RIO_CM_MPORT_GET_LIST:
+		return cm_mport_get_list((void __user *)arg);
+	default:
+		break;
+	}
+
+	return -EINVAL;
+}
+
+static const struct file_operations riocm_cdev_fops = {
+	.owner		= THIS_MODULE,
+	.open		= riocm_cdev_open,
+	.release	= riocm_cdev_release,
+	.unlocked_ioctl = riocm_cdev_ioctl,
+};
+
+/*
+ * riocm_add_dev - add new remote RapidIO device into channel management core
+ * @dev: device object associated with RapidIO device
+ * @sif: subsystem interface
+ *
+ * Adds the specified RapidIO device (if applicable) into peers list of
+ * the corresponding channel management device (cm_dev).
+ */
+static int riocm_add_dev(struct device *dev, struct subsys_interface *sif)
+{
+	struct cm_peer *peer;
+	struct rio_dev *rdev = to_rio_dev(dev);
+	struct cm_dev *cm;
+
+	/* Check if the remote device has capabilities required to support CM */
+	if (!dev_cm_capable(rdev))
+		return 0;
+
+	riocm_debug(RDEV, "(%s)", rio_name(rdev));
+
+	peer = kmalloc(sizeof(*peer), GFP_KERNEL);
+	if (!peer)
+		return -ENOMEM;
+
+	/* Find a corresponding cm_dev object */
+	down_write(&rdev_sem);
+	list_for_each_entry(cm, &cm_dev_list, list) {
+		if (cm->mport == rdev->net->hport)
+			goto found;
+	}
+
+	up_write(&rdev_sem);
+	kfree(peer);
+	return -ENODEV;
+
+found:
+	peer->rdev = rdev;
+	list_add_tail(&peer->node, &cm->peers);
+	cm->npeers++;
+
+	up_write(&rdev_sem);
+	return 0;
+}
+
+/*
+ * riocm_remove_dev - remove remote RapidIO device from channel management core
+ * @dev: device object associated with RapidIO device
+ * @sif: subsystem interface
+ *
+ * Removes the specified RapidIO device (if applicable) from peers list of
+ * the corresponding channel management device (cm_dev).
+ */
+static void riocm_remove_dev(struct device *dev, struct subsys_interface *sif)
+{
+	struct rio_dev *rdev = to_rio_dev(dev);
+	struct cm_dev *cm;
+	struct cm_peer *peer;
+	struct rio_channel *ch, *_c;
+	unsigned int i;
+	bool found = false;
+	LIST_HEAD(list);
+
+	/* Check if the remote device has capabilities required to support CM */
+	if (!dev_cm_capable(rdev))
+		return;
+
+	riocm_debug(RDEV, "(%s)", rio_name(rdev));
+
+	/* Find matching cm_dev object */
+	down_write(&rdev_sem);
+	list_for_each_entry(cm, &cm_dev_list, list) {
+		if (cm->mport == rdev->net->hport) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		up_write(&rdev_sem);
+		return;
+	}
+
+	/* Remove remote device from the list of peers */
+	found = false;
+	list_for_each_entry(peer, &cm->peers, node) {
+		if (peer->rdev == rdev) {
+			riocm_debug(RDEV, "removing peer %s", rio_name(rdev));
+			found = true;
+			list_del(&peer->node);
+			cm->npeers--;
+			kfree(peer);
+			break;
+		}
+	}
+
+	up_write(&rdev_sem);
+
+	if (!found)
+		return;
+
+	/*
+	 * Release channels associated with this peer
+	 */
+
+	spin_lock_bh(&idr_lock);
+	idr_for_each_entry(&ch_idr, ch, i) {
+		if (ch && ch->rdev == rdev) {
+			if (atomic_read(&rdev->state) != RIO_DEVICE_SHUTDOWN)
+				riocm_exch(ch, RIO_CM_DISCONNECT);
+			idr_remove(&ch_idr, ch->id);
+			list_add(&ch->ch_node, &list);
+		}
+	}
+	spin_unlock_bh(&idr_lock);
+
+	if (!list_empty(&list)) {
+		list_for_each_entry_safe(ch, _c, &list, ch_node) {
+			list_del(&ch->ch_node);
+			riocm_ch_close(ch);
+		}
+	}
+}
+
+/*
+ * riocm_cdev_add() - Create rio_cm char device
+ * @devno: device number assigned to device (MAJ + MIN)
+ */
+static int riocm_cdev_add(dev_t devno)
+{
+	int ret;
+
+	cdev_init(&riocm_cdev.cdev, &riocm_cdev_fops);
+	riocm_cdev.cdev.owner = THIS_MODULE;
+	ret = cdev_add(&riocm_cdev.cdev, devno, 1);
+	if (ret < 0) {
+		riocm_error("Cannot register a device with error %d", ret);
+		return ret;
+	}
+
+	riocm_cdev.dev = device_create(dev_class, NULL, devno, NULL, DEV_NAME);
+	if (IS_ERR(riocm_cdev.dev)) {
+		cdev_del(&riocm_cdev.cdev);
+		return PTR_ERR(riocm_cdev.dev);
+	}
+
+	riocm_debug(MPORT, "Added %s cdev(%d:%d)",
+		    DEV_NAME, MAJOR(devno), MINOR(devno));
+
+	return 0;
+}
+
+/*
+ * riocm_add_mport - add new local mport device into channel management core
+ * @dev: device object associated with mport
+ * @class_intf: class interface
+ *
+ * When a new mport device is added, CM immediately reserves inbound and
+ * outbound RapidIO mailboxes that will be used.
+ */
+static int riocm_add_mport(struct device *dev,
+			   struct class_interface *class_intf)
+{
+	int rc;
+	int i;
+	struct cm_dev *cm;
+	struct rio_mport *mport = to_rio_mport(dev);
+
+	riocm_debug(MPORT, "add mport %s", mport->name);
+
+	cm = kzalloc(sizeof(*cm), GFP_KERNEL);
+	if (!cm)
+		return -ENOMEM;
+
+	cm->mport = mport;
+
+	rc = rio_request_outb_mbox(mport, cm, cmbox,
+				   RIOCM_TX_RING_SIZE, riocm_outb_msg_event);
+	if (rc) {
+		riocm_error("failed to allocate OBMBOX_%d on %s",
+			    cmbox, mport->name);
+		kfree(cm);
+		return -ENODEV;
+	}
+
+	rc = rio_request_inb_mbox(mport, cm, cmbox,
+				  RIOCM_RX_RING_SIZE, riocm_inb_msg_event);
+	if (rc) {
+		riocm_error("failed to allocate IBMBOX_%d on %s",
+			    cmbox, mport->name);
+		rio_release_outb_mbox(mport, cmbox);
+		kfree(cm);
+		return -ENODEV;
+	}
+
+	/*
+	 * Allocate and register inbound messaging buffers to be ready
+	 * to receive channel and system management requests
+	 */
+	for (i = 0; i < RIOCM_RX_RING_SIZE; i++)
+		cm->rx_buf[i] = NULL;
+
+	cm->rx_slots = RIOCM_RX_RING_SIZE;
+	mutex_init(&cm->rx_lock);
+	riocm_rx_fill(cm, RIOCM_RX_RING_SIZE);
+	cm->rx_wq = create_workqueue(DRV_NAME "/rxq");
+	INIT_WORK(&cm->rx_work, rio_ibmsg_handler);
+
+	cm->tx_slot = 0;
+	cm->tx_cnt = 0;
+	cm->tx_ack_slot = 0;
+	spin_lock_init(&cm->tx_lock);
+
+	INIT_LIST_HEAD(&cm->peers);
+	cm->npeers = 0;
+	INIT_LIST_HEAD(&cm->tx_reqs);
+
+	down_write(&rdev_sem);
+	list_add_tail(&cm->list, &cm_dev_list);
+	up_write(&rdev_sem);
+
+	return 0;
+}
+
+/*
+ * riocm_remove_mport - remove local mport device from channel management core
+ * @dev: device object associated with mport
+ * @class_intf: class interface
+ *
+ * Removes a local mport device from the list of registered devices that provide
+ * channel management services. Returns an error if the specified mport is not
+ * registered with the CM core.
+ */
+static void riocm_remove_mport(struct device *dev,
+			       struct class_interface *class_intf)
+{
+	struct rio_mport *mport = to_rio_mport(dev);
+	struct cm_dev *cm;
+	struct cm_peer *peer, *temp;
+	struct rio_channel *ch, *_c;
+	unsigned int i;
+	bool found = false;
+	LIST_HEAD(list);
+
+	riocm_debug(MPORT, "%s", mport->name);
+
+	/* Find a matching cm_dev object */
+	down_write(&rdev_sem);
+	list_for_each_entry(cm, &cm_dev_list, list) {
+		if (cm->mport == mport) {
+			list_del(&cm->list);
+			found = true;
+			break;
+		}
+	}
+	up_write(&rdev_sem);
+	if (!found)
+		return;
+
+	flush_workqueue(cm->rx_wq);
+	destroy_workqueue(cm->rx_wq);
+
+	/* Release channels bound to this mport */
+	spin_lock_bh(&idr_lock);
+	idr_for_each_entry(&ch_idr, ch, i) {
+		if (ch->cmdev == cm) {
+			riocm_debug(RDEV, "%s drop ch_%d",
+				    mport->name, ch->id);
+			idr_remove(&ch_idr, ch->id);
+			list_add(&ch->ch_node, &list);
+		}
+	}
+	spin_unlock_bh(&idr_lock);
+
+	if (!list_empty(&list)) {
+		list_for_each_entry_safe(ch, _c, &list, ch_node) {
+			list_del(&ch->ch_node);
+			riocm_ch_close(ch);
+		}
+	}
+
+	rio_release_inb_mbox(mport, cmbox);
+	rio_release_outb_mbox(mport, cmbox);
+
+	/* Remove and free peer entries */
+	if (!list_empty(&cm->peers))
+		riocm_debug(RDEV, "ATTN: peer list not empty");
+	list_for_each_entry_safe(peer, temp, &cm->peers, node) {
+		riocm_debug(RDEV, "removing peer %s", rio_name(peer->rdev));
+		list_del(&peer->node);
+		kfree(peer);
+	}
+
+	riocm_rx_free(cm);
+	kfree(cm);
+	riocm_debug(MPORT, "%s done", mport->name);
+}
+
+static int rio_cm_shutdown(struct notifier_block *nb, unsigned long code,
+	void *unused)
+{
+	struct rio_channel *ch;
+	unsigned int i;
+
+	riocm_debug(EXIT, ".");
+
+	spin_lock_bh(&idr_lock);
+	idr_for_each_entry(&ch_idr, ch, i) {
+		riocm_debug(EXIT, "close ch %d", ch->id);
+		if (ch->state == RIO_CM_CONNECTED)
+			riocm_send_close(ch);
+	}
+	spin_unlock_bh(&idr_lock);
+
+	return NOTIFY_DONE;
+}
+
+/*
+ * riocm_interface handles addition/removal of remote RapidIO devices
+ */
+static struct subsys_interface riocm_interface = {
+	.name		= "rio_cm",
+	.subsys		= &rio_bus_type,
+	.add_dev	= riocm_add_dev,
+	.remove_dev	= riocm_remove_dev,
+};
+
+/*
+ * rio_mport_interface handles addition/removal local mport devices
+ */
+static struct class_interface rio_mport_interface __refdata = {
+	.class = &rio_mport_class,
+	.add_dev = riocm_add_mport,
+	.remove_dev = riocm_remove_mport,
+};
+
+static struct notifier_block rio_cm_notifier = {
+	.notifier_call = rio_cm_shutdown,
+};
+
+static int __init riocm_init(void)
+{
+	int ret;
+
+	/* Create device class needed by udev */
+	dev_class = class_create(THIS_MODULE, DRV_NAME);
+	if (IS_ERR(dev_class)) {
+		riocm_error("Cannot create " DRV_NAME " class");
+		return PTR_ERR(dev_class);
+	}
+
+	ret = alloc_chrdev_region(&dev_number, 0, 1, DRV_NAME);
+	if (ret) {
+		class_destroy(dev_class);
+		return ret;
+	}
+
+	dev_major = MAJOR(dev_number);
+	dev_minor_base = MINOR(dev_number);
+	riocm_debug(INIT, "Registered class with %d major", dev_major);
+
+	/*
+	 * Register as rapidio_port class interface to get notifications about
+	 * mport additions and removals.
+	 */
+	ret = class_interface_register(&rio_mport_interface);
+	if (ret) {
+		riocm_error("class_interface_register error: %d", ret);
+		goto err_reg;
+	}
+
+	/*
+	 * Register as RapidIO bus interface to get notifications about
+	 * addition/removal of remote RapidIO devices.
+	 */
+	ret = subsys_interface_register(&riocm_interface);
+	if (ret) {
+		riocm_error("subsys_interface_register error: %d", ret);
+		goto err_cl;
+	}
+
+	ret = register_reboot_notifier(&rio_cm_notifier);
+	if (ret) {
+		riocm_error("failed to register reboot notifier (err=%d)", ret);
+		goto err_sif;
+	}
+
+	ret = riocm_cdev_add(dev_number);
+	if (ret) {
+		unregister_reboot_notifier(&rio_cm_notifier);
+		ret = -ENODEV;
+		goto err_sif;
+	}
+
+	return 0;
+err_sif:
+	subsys_interface_unregister(&riocm_interface);
+err_cl:
+	class_interface_unregister(&rio_mport_interface);
+err_reg:
+	unregister_chrdev_region(dev_number, 1);
+	class_destroy(dev_class);
+	return ret;
+}
+
+static void __exit riocm_exit(void)
+{
+	riocm_debug(EXIT, "enter");
+	unregister_reboot_notifier(&rio_cm_notifier);
+	subsys_interface_unregister(&riocm_interface);
+	class_interface_unregister(&rio_mport_interface);
+	idr_destroy(&ch_idr);
+
+	device_unregister(riocm_cdev.dev);
+	cdev_del(&(riocm_cdev.cdev));
+
+	class_destroy(dev_class);
+	unregister_chrdev_region(dev_number, 1);
+}
+
+late_initcall(riocm_init);
+module_exit(riocm_exit);
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 6d4e92ccdc91..c44747c0796a 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -357,6 +357,7 @@ header-y += reiserfs_fs.h
 header-y += reiserfs_xattr.h
 header-y += resource.h
 header-y += rfkill.h
+header-y += rio_cm_cdev.h
 header-y += rio_mport_cdev.h
 header-y += romfs_fs.h
 header-y += rose.h
diff --git a/include/uapi/linux/rio_cm_cdev.h b/include/uapi/linux/rio_cm_cdev.h
new file mode 100644
index 000000000000..6edb900d318d
--- /dev/null
+++ b/include/uapi/linux/rio_cm_cdev.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright (c) 2015, Integrated Device Technology Inc.
+ * Copyright (c) 2015, Prodrive Technologies
+ * Copyright (c) 2015, RapidIO Trade Association
+ * All rights reserved.
+ *
+ * This software is available to you under a choice of one of two licenses.
+ * You may choose to be licensed under the terms of the GNU General Public
+ * License(GPL) Version 2, or the BSD-3 Clause license below:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright notice,
+ * this list of conditions and the following disclaimer in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ * 3. Neither the name of the copyright holder nor the names of its contributors
+ * may be used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
+ * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
+ * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RIO_CM_CDEV_H_
+#define _RIO_CM_CDEV_H_
+
+#include <linux/types.h>
+
+struct rio_cm_channel {
+	__u16 id;
+	__u16 remote_channel;
+	__u16 remote_destid;
+	__u8 mport_id;
+};
+
+struct rio_cm_msg {
+	__u16 ch_num;
+	__u16 size;
+	__u32 rxto;	/* receive timeout in mSec. 0 = blocking */
+	__u64 msg;
+};
+
+struct rio_cm_accept {
+	__u16 ch_num;
+	__u16 pad0;
+	__u32 wait_to;	/* accept timeout in mSec. 0 = blocking */
+};
+
+/* RapidIO Channelized Messaging Driver IOCTLs */
+#define RIO_CM_IOC_MAGIC	'c'
+
+#define RIO_CM_EP_GET_LIST_SIZE	_IOWR(RIO_CM_IOC_MAGIC, 1, __u32)
+#define RIO_CM_EP_GET_LIST	_IOWR(RIO_CM_IOC_MAGIC, 2, __u32)
+#define RIO_CM_CHAN_CREATE	_IOWR(RIO_CM_IOC_MAGIC, 3, __u16)
+#define RIO_CM_CHAN_CLOSE	_IOW(RIO_CM_IOC_MAGIC, 4, __u16)
+#define RIO_CM_CHAN_BIND	_IOW(RIO_CM_IOC_MAGIC, 5, struct rio_cm_channel)
+#define RIO_CM_CHAN_LISTEN	_IOW(RIO_CM_IOC_MAGIC, 6, __u16)
+#define RIO_CM_CHAN_ACCEPT	_IOWR(RIO_CM_IOC_MAGIC, 7, struct rio_cm_accept)
+#define RIO_CM_CHAN_CONNECT	_IOW(RIO_CM_IOC_MAGIC, 8, struct rio_cm_channel)
+#define RIO_CM_CHAN_SEND	_IOW(RIO_CM_IOC_MAGIC, 9, struct rio_cm_msg)
+#define RIO_CM_CHAN_RECEIVE	_IOWR(RIO_CM_IOC_MAGIC, 10, struct rio_cm_msg)
+#define RIO_CM_MPORT_GET_LIST	_IOWR(RIO_CM_IOC_MAGIC, 11, __u32)
+
+#endif /* _RIO_CM_CDEV_H_ */

From ea87b8e1f72896d9852f534f77aeec5eaf377b32 Mon Sep 17 00:00:00 2001
From: Joe Perches <joe@perches.com>
Date: Tue, 2 Aug 2016 14:06:28 -0700
Subject: [PATCH 089/111] rapidio: remove unnecessary 0x prefixes before %pa
 extension uses

Patch series "RapidIO subsystem updates".

This set of patches contains RapidIO subsystem fixes and updates that
have been made since kernel v4.6.  The most significant update brings
changes related to the latest revision of RapidIO specification
(rev.3.x) and introduction of next generation of RapidIO switches by IDT
(RXS1632 and RXS2448).

This patch (of 13):

This is RapidIO part of the original patch submitted by Joe Perches.
(see: https://lkml.org/lkml/2016/3/5/19)

Since commit 3cab1e711297 ("lib/vsprintf: refactor duplicate code
to special_hex_number()") %pa uses have been output with a 0x prefix.

These 0x prefixes in the formats are unnecessary.

Link: http://lkml.kernel.org/r/1469125134-16523-2-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/rapidio/devices/rio_mport_cdev.c | 4 ++--
 drivers/rapidio/devices/tsi721.c         | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/rapidio/devices/rio_mport_cdev.c b/drivers/rapidio/devices/rio_mport_cdev.c
index e165b7ce29d7..de0c692587dd 100644
--- a/drivers/rapidio/devices/rio_mport_cdev.c
+++ b/drivers/rapidio/devices/rio_mport_cdev.c
@@ -2242,7 +2242,7 @@ static void mport_mm_open(struct vm_area_struct *vma)
 {
 	struct rio_mport_mapping *map = vma->vm_private_data;
 
-rmcd_debug(MMAP, "0x%pad", &map->phys_addr);
+	rmcd_debug(MMAP, "%pad", &map->phys_addr);
 	kref_get(&map->ref);
 }
 
@@ -2250,7 +2250,7 @@ static void mport_mm_close(struct vm_area_struct *vma)
 {
 	struct rio_mport_mapping *map = vma->vm_private_data;
 
-rmcd_debug(MMAP, "0x%pad", &map->phys_addr);
+	rmcd_debug(MMAP, "%pad", &map->phys_addr);
 	mutex_lock(&map->md->buf_mutex);
 	kref_put(&map->ref, mport_release_mapping);
 	mutex_unlock(&map->md->buf_mutex);
diff --git a/drivers/rapidio/devices/tsi721.c b/drivers/rapidio/devices/tsi721.c
index b5b455614f8a..4c20e9927a7e 100644
--- a/drivers/rapidio/devices/tsi721.c
+++ b/drivers/rapidio/devices/tsi721.c
@@ -1101,7 +1101,7 @@ static int tsi721_rio_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
 		ibw_start = lstart & ~(ibw_size - 1);
 
 		tsi_debug(IBW, &priv->pdev->dev,
-			"Direct (RIO_0x%llx -> PCIe_0x%pad), size=0x%x, ibw_start = 0x%llx",
+			"Direct (RIO_0x%llx -> PCIe_%pad), size=0x%x, ibw_start = 0x%llx",
 			rstart, &lstart, size, ibw_start);
 
 		while ((lstart + size) > (ibw_start + ibw_size)) {
@@ -1120,7 +1120,7 @@ static int tsi721_rio_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
 
 	} else {
 		tsi_debug(IBW, &priv->pdev->dev,
-			"Translated (RIO_0x%llx -> PCIe_0x%pad), size=0x%x",
+			"Translated (RIO_0x%llx -> PCIe_%pad), size=0x%x",
 			rstart, &lstart, size);
 
 		if (!is_power_of_2(size) || size < 0x1000 ||
@@ -1215,7 +1215,7 @@ static int tsi721_rio_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
 	priv->ibwin_cnt--;
 
 	tsi_debug(IBW, &priv->pdev->dev,
-		"Configured IBWIN%d (RIO_0x%llx -> PCIe_0x%pad), size=0x%llx",
+		"Configured IBWIN%d (RIO_0x%llx -> PCIe_%pad), size=0x%llx",
 		i, ibw_start, &loc_start, ibw_size);
 
 	return 0;
@@ -1237,7 +1237,7 @@ static void tsi721_rio_unmap_inb_mem(struct rio_mport *mport,
 	int i;
 
 	tsi_debug(IBW, &priv->pdev->dev,
-		"Unmap IBW mapped to PCIe_0x%pad", &lstart);
+		"Unmap IBW mapped to PCIe_%pad", &lstart);
 
 	/* Search for matching active inbound translation window */
 	for (i = 0; i < TSI721_IBWIN_NUM; i++) {

From cca446d4ce883ae28ff589dd1c9aef8d5148c7f7 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:31 -0700
Subject: [PATCH 090/111] rapidio/documentation: fix mangled paragraph in
 mport_cdev

Minor edits to correct parameter description.

This patch is applicable to kernel versions starting from v4.6.

Link: http://lkml.kernel.org/r/1469125134-16523-3-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Reported-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/rapidio/mport_cdev.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/rapidio/mport_cdev.txt b/Documentation/rapidio/mport_cdev.txt
index 20c120d4b3b8..6e491a662461 100644
--- a/Documentation/rapidio/mport_cdev.txt
+++ b/Documentation/rapidio/mport_cdev.txt
@@ -82,8 +82,7 @@ III. Module parameters
 
 - 'dbg_level' - This parameter allows to control amount of debug information
         generated by this device driver. This parameter is formed by set of
-        This parameter can be changed bit masks that correspond to the specific
-        functional block.
+        bit masks that correspond to the specific functional blocks.
         For mask definitions see 'drivers/rapidio/devices/rio_mport_cdev.c'
         This parameter can be changed dynamically.
         Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.

From f8e3a68c05f0f09a0da947b9d447268d2d3f8780 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:34 -0700
Subject: [PATCH 091/111] rapidio: fix return value description for dma_prep
 functions

Update return value description for rio_dma_prep_...  functions to
include error-valued pointer that can be returned by HW mport device
drivers.  Return values from these functions must be checked using
IS_ERR_OR_NULL macro.

This patch is applicable to kernel versions starting from v4.6-rc1.

Link: http://lkml.kernel.org/r/1469125134-16523-4-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/rapidio/rio.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/rapidio/rio.c b/drivers/rapidio/rio.c
index 0dcaa660cba1..840802943dc0 100644
--- a/drivers/rapidio/rio.c
+++ b/drivers/rapidio/rio.c
@@ -1848,7 +1848,9 @@ EXPORT_SYMBOL_GPL(rio_release_dma);
  * Initializes RapidIO capable DMA channel for the specified data transfer.
  * Uses DMA channel private extension to pass information related to remote
  * target RIO device.
- * Returns pointer to DMA transaction descriptor or NULL if failed.
+ *
+ * Returns: pointer to DMA transaction descriptor if successful,
+ *          error-valued pointer or NULL if failed.
  */
 struct dma_async_tx_descriptor *rio_dma_prep_xfer(struct dma_chan *dchan,
 	u16 destid, struct rio_dma_data *data,
@@ -1883,7 +1885,9 @@ EXPORT_SYMBOL_GPL(rio_dma_prep_xfer);
  * Initializes RapidIO capable DMA channel for the specified data transfer.
  * Uses DMA channel private extension to pass information related to remote
  * target RIO device.
- * Returns pointer to DMA transaction descriptor or NULL if failed.
+ *
+ * Returns: pointer to DMA transaction descriptor if successful,
+ *          error-valued pointer or NULL if failed.
  */
 struct dma_async_tx_descriptor *rio_dma_prep_slave_sg(struct rio_dev *rdev,
 	struct dma_chan *dchan, struct rio_dma_data *data,

From 4498c31adff99d243b34b0bf39363a35ea070928 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:37 -0700
Subject: [PATCH 092/111] rapidio/tsi721_dma: add channel mask and queue size
 parameters

Add module parameters to allow load time configuration of DMA channels.

Depending on application, performance of DMA data transfers can benefit
from adjusted sizes of buffer descriptor ring and/or transaction
requests queue.

Having HW DMA channel selector mask allows to define which channels
(from seven available) are controlled by the mport device driver and
reserve some of them for direct use by other drivers.

Link: http://lkml.kernel.org/r/1469125134-16523-5-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/rapidio/tsi721.txt     | 14 ++++++++++++++
 drivers/rapidio/devices/tsi721.h     |  2 +-
 drivers/rapidio/devices/tsi721_dma.c | 26 +++++++++++++++++---------
 3 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/Documentation/rapidio/tsi721.txt b/Documentation/rapidio/tsi721.txt
index 7c1c7bf48ec0..0e0e90bef882 100644
--- a/Documentation/rapidio/tsi721.txt
+++ b/Documentation/rapidio/tsi721.txt
@@ -25,6 +25,20 @@ fully compatible with RIONET driver (Ethernet over RapidIO messaging services).
         This parameter can be changed dynamically.
         Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
 
+- 'dma_desc_per_channel' - This parameter defines number of hardware buffer
+        descriptors allocated for each registered Tsi721 DMA channel.
+        Its default value is 128.
+
+- 'dma_txqueue_sz' - DMA transactions queue size. Defines number of pending
+        transaction requests that can be accepted by each DMA channel.
+        Default value is 16.
+
+- 'dma_sel' - DMA channel selection mask. Bitmask that defines which hardware
+        DMA channels (0 ... 6) will be registered with DmaEngine core.
+        If bit is set to 1, the corresponding DMA channel will be registered.
+        DMA channels not selected by this mask will not be used by this device
+        driver. Default value is 0x7f (use all channels).
+
 II. Known problems
 
   None.
diff --git a/drivers/rapidio/devices/tsi721.h b/drivers/rapidio/devices/tsi721.h
index 5456dbddc929..5941437cbdd1 100644
--- a/drivers/rapidio/devices/tsi721.h
+++ b/drivers/rapidio/devices/tsi721.h
@@ -661,7 +661,7 @@ enum dma_rtype {
  */
 #define TSI721_DMA_CHNUM	TSI721_DMA_MAXCH
 
-#define TSI721_DMACH_MAINT	0	/* DMA channel for maint requests */
+#define TSI721_DMACH_MAINT	7	/* DMA channel for maint requests */
 #define TSI721_DMACH_MAINT_NBD	32	/* Number of BDs for maint requests */
 
 #define TSI721_DMACH_DMA	1	/* DMA channel for data transfers */
diff --git a/drivers/rapidio/devices/tsi721_dma.c b/drivers/rapidio/devices/tsi721_dma.c
index 155cae1e62de..13c669bac019 100644
--- a/drivers/rapidio/devices/tsi721_dma.c
+++ b/drivers/rapidio/devices/tsi721_dma.c
@@ -36,18 +36,26 @@
 
 #include "tsi721.h"
 
-#define TSI721_DMA_TX_QUEUE_SZ	16	/* number of transaction descriptors */
-
 #ifdef CONFIG_PCI_MSI
 static irqreturn_t tsi721_bdma_msix(int irq, void *ptr);
 #endif
 static int tsi721_submit_sg(struct tsi721_tx_desc *desc);
 
 static unsigned int dma_desc_per_channel = 128;
-module_param(dma_desc_per_channel, uint, S_IWUSR | S_IRUGO);
+module_param(dma_desc_per_channel, uint, S_IRUGO);
 MODULE_PARM_DESC(dma_desc_per_channel,
 		 "Number of DMA descriptors per channel (default: 128)");
 
+static unsigned int dma_txqueue_sz = 16;
+module_param(dma_txqueue_sz, uint, S_IRUGO);
+MODULE_PARM_DESC(dma_txqueue_sz,
+		 "DMA Transactions Queue Size (default: 16)");
+
+static u8 dma_sel = 0x7f;
+module_param(dma_sel, byte, S_IRUGO);
+MODULE_PARM_DESC(dma_sel,
+		 "DMA Channel Selection Mask (default: 0x7f = all)");
+
 static inline struct tsi721_bdma_chan *to_tsi721_chan(struct dma_chan *chan)
 {
 	return container_of(chan, struct tsi721_bdma_chan, dchan);
@@ -732,7 +740,7 @@ static int tsi721_alloc_chan_resources(struct dma_chan *dchan)
 	tsi_debug(DMA, &dchan->dev->device, "DMAC%d", bdma_chan->id);
 
 	if (bdma_chan->bd_base)
-		return TSI721_DMA_TX_QUEUE_SZ;
+		return dma_txqueue_sz;
 
 	/* Initialize BDMA channel */
 	if (tsi721_bdma_ch_init(bdma_chan, dma_desc_per_channel)) {
@@ -742,7 +750,7 @@ static int tsi721_alloc_chan_resources(struct dma_chan *dchan)
 	}
 
 	/* Allocate queue of transaction descriptors */
-	desc = kcalloc(TSI721_DMA_TX_QUEUE_SZ, sizeof(struct tsi721_tx_desc),
+	desc = kcalloc(dma_txqueue_sz, sizeof(struct tsi721_tx_desc),
 			GFP_ATOMIC);
 	if (!desc) {
 		tsi_err(&dchan->dev->device,
@@ -754,7 +762,7 @@ static int tsi721_alloc_chan_resources(struct dma_chan *dchan)
 
 	bdma_chan->tx_desc = desc;
 
-	for (i = 0; i < TSI721_DMA_TX_QUEUE_SZ; i++) {
+	for (i = 0; i < dma_txqueue_sz; i++) {
 		dma_async_tx_descriptor_init(&desc[i].txd, dchan);
 		desc[i].txd.tx_submit = tsi721_tx_submit;
 		desc[i].txd.flags = DMA_CTRL_ACK;
@@ -766,7 +774,7 @@ static int tsi721_alloc_chan_resources(struct dma_chan *dchan)
 	bdma_chan->active = true;
 	tsi721_bdma_interrupt_enable(bdma_chan, 1);
 
-	return TSI721_DMA_TX_QUEUE_SZ;
+	return dma_txqueue_sz;
 }
 
 static void tsi721_sync_dma_irq(struct tsi721_bdma_chan *bdma_chan)
@@ -962,7 +970,7 @@ void tsi721_dma_stop_all(struct tsi721_device *priv)
 	int i;
 
 	for (i = 0; i < TSI721_DMA_MAXCH; i++) {
-		if (i != TSI721_DMACH_MAINT)
+		if ((i != TSI721_DMACH_MAINT) && (dma_sel & (1 << i)))
 			tsi721_dma_stop(&priv->bdma[i]);
 	}
 }
@@ -979,7 +987,7 @@ int tsi721_register_dma(struct tsi721_device *priv)
 	for (i = 0; i < TSI721_DMA_MAXCH; i++) {
 		struct tsi721_bdma_chan *bdma_chan = &priv->bdma[i];
 
-		if (i == TSI721_DMACH_MAINT)
+		if ((i == TSI721_DMACH_MAINT) || (dma_sel & (1 << i)) == 0)
 			continue;
 
 		bdma_chan->regs = priv->regs + TSI721_DMAC_BASE(i);

From cb782cdd2ffffbf7fd17e4aefb20f4db5c67caeb Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:40 -0700
Subject: [PATCH 093/111] rapidio/tsi721: add PCIe MRRS override parameter

Add PCIe Maximum Read Request Size (MRRS) adjustment parameter to allow
users to override configuration register value set during PCIe bus
initialization.

Performance of Tsi721 device as PCIe bus master can be improved if MRRS
is set to its maximum value (4096 bytes).  Some platforms have
limitations for supported MRRS and therefore the default value should be
preserved, unless it is known that given platform supports full set of
MRRS values defined by PCI Express specification.

Link: http://lkml.kernel.org/r/1469125134-16523-6-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/rapidio/tsi721.txt |  7 +++++++
 drivers/rapidio/devices/tsi721.c | 16 +++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/rapidio/tsi721.txt b/Documentation/rapidio/tsi721.txt
index 0e0e90bef882..9c6ee3853793 100644
--- a/Documentation/rapidio/tsi721.txt
+++ b/Documentation/rapidio/tsi721.txt
@@ -39,6 +39,13 @@ fully compatible with RIONET driver (Ethernet over RapidIO messaging services).
         DMA channels not selected by this mask will not be used by this device
         driver. Default value is 0x7f (use all channels).
 
+- 'pcie_mrrs' - override value for PCIe Maximum Read Request Size (MRRS).
+        This parameter gives an ability to override MRRS value set during PCIe
+        configuration process. Tsi721 supports read request sizes up to 4096B.
+        Value for this parameter must be set as defined by PCIe specification:
+        0 = 128B, 1 = 256B, 2 = 512B, 3 = 1024B, 4 = 2048B and 5 = 4096B.
+        Default value is '-1' (= keep platform setting).
+
 II. Known problems
 
   None.
diff --git a/drivers/rapidio/devices/tsi721.c b/drivers/rapidio/devices/tsi721.c
index 4c20e9927a7e..85098f8973a9 100644
--- a/drivers/rapidio/devices/tsi721.c
+++ b/drivers/rapidio/devices/tsi721.c
@@ -37,11 +37,15 @@
 #include "tsi721.h"
 
 #ifdef DEBUG
-u32 dbg_level = DBG_INIT | DBG_EXIT;
+u32 dbg_level;
 module_param(dbg_level, uint, S_IWUSR | S_IRUGO);
 MODULE_PARM_DESC(dbg_level, "Debugging output level (default 0 = none)");
 #endif
 
+static int pcie_mrrs = -1;
+module_param(pcie_mrrs, int, S_IRUGO);
+MODULE_PARM_DESC(pcie_mrrs, "PCIe MRRS override value (0...5)");
+
 static void tsi721_omsg_handler(struct tsi721_device *priv, int ch);
 static void tsi721_imsg_handler(struct tsi721_device *priv, int ch);
 
@@ -2840,6 +2844,16 @@ static int tsi721_probe(struct pci_dev *pdev,
 	pcie_capability_clear_and_set_word(pdev, PCI_EXP_DEVCTL,
 		PCI_EXP_DEVCTL_RELAX_EN | PCI_EXP_DEVCTL_NOSNOOP_EN, 0);
 
+	/* Override PCIe Maximum Read Request Size setting if requested */
+	if (pcie_mrrs >= 0) {
+		if (pcie_mrrs <= 5)
+			pcie_capability_clear_and_set_word(pdev, PCI_EXP_DEVCTL,
+					PCI_EXP_DEVCTL_READRQ, pcie_mrrs << 12);
+		else
+			tsi_info(&pdev->dev,
+				 "Invalid MRRS override value %d", pcie_mrrs);
+	}
+
 	/* Adjust PCIe completion timeout. */
 	pcie_capability_clear_and_set_word(pdev, PCI_EXP_DEVCTL2, 0xf, 0x2);
 

From e519685de3e44bb013d81f5ead04ac4b33c9b3a1 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:43 -0700
Subject: [PATCH 094/111] rapidio/tsi721: add messaging mbox selector parameter

Add module parameter to allow load time configuration of available
RapidIO messaging mailboxes (MBOX1 - MBOX4).

Having a messaging MBOX selector mask allows to define which MBOXes are
controlled by the mport device driver and reserve some of them for
direct use by other drivers.

Link: http://lkml.kernel.org/r/1469125134-16523-7-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/rapidio/tsi721.txt |  5 +++++
 drivers/rapidio/devices/tsi721.c | 15 +++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/Documentation/rapidio/tsi721.txt b/Documentation/rapidio/tsi721.txt
index 9c6ee3853793..cd2a2935d51d 100644
--- a/Documentation/rapidio/tsi721.txt
+++ b/Documentation/rapidio/tsi721.txt
@@ -46,6 +46,11 @@ fully compatible with RIONET driver (Ethernet over RapidIO messaging services).
         0 = 128B, 1 = 256B, 2 = 512B, 3 = 1024B, 4 = 2048B and 5 = 4096B.
         Default value is '-1' (= keep platform setting).
 
+- 'mbox_sel' - RIO messaging MBOX selection mask. This is a bitmask that defines
+        messaging MBOXes are managed by this device driver. Mask bits 0 - 3
+        correspond to MBOX0 - MBOX3. MBOX is under driver's control if the
+        corresponding bit is set to '1'. Default value is 0x0f (= all).
+
 II. Known problems
 
   None.
diff --git a/drivers/rapidio/devices/tsi721.c b/drivers/rapidio/devices/tsi721.c
index 85098f8973a9..8e07cd56abdc 100644
--- a/drivers/rapidio/devices/tsi721.c
+++ b/drivers/rapidio/devices/tsi721.c
@@ -46,6 +46,11 @@ static int pcie_mrrs = -1;
 module_param(pcie_mrrs, int, S_IRUGO);
 MODULE_PARM_DESC(pcie_mrrs, "PCIe MRRS override value (0...5)");
 
+static u8 mbox_sel = 0x0f;
+module_param(mbox_sel, byte, S_IRUGO);
+MODULE_PARM_DESC(mbox_sel,
+		 "RIO Messaging MBOX Selection Mask (default: 0x0f = all)");
+
 static void tsi721_omsg_handler(struct tsi721_device *priv, int ch);
 static void tsi721_imsg_handler(struct tsi721_device *priv, int ch);
 
@@ -1881,6 +1886,11 @@ static int tsi721_open_outb_mbox(struct rio_mport *mport, void *dev_id,
 		goto out;
 	}
 
+	if ((mbox_sel & (1 << mbox)) == 0) {
+		rc = -ENODEV;
+		goto out;
+	}
+
 	priv->omsg_ring[mbox].dev_id = dev_id;
 	priv->omsg_ring[mbox].size = entries;
 	priv->omsg_ring[mbox].sts_rdptr = 0;
@@ -2165,6 +2175,11 @@ static int tsi721_open_inb_mbox(struct rio_mport *mport, void *dev_id,
 		goto out;
 	}
 
+	if ((mbox_sel & (1 << mbox)) == 0) {
+		rc = -ENODEV;
+		goto out;
+	}
+
 	/* Initialize IB Messaging Ring */
 	priv->imsg_ring[mbox].dev_id = dev_id;
 	priv->imsg_ring[mbox].size = entries;

From f5485eb0b6eb8a3e5841cfea34a930822f7252bc Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:46 -0700
Subject: [PATCH 095/111] rapidio/tsi721_dma: advance queue processing from
 transfer submit call

Add advancing transfer queue immediately from transfer submit call.  DMA
performance improvement: This will start transfer without waiting for
'issue_pending' command if there is no DMA transfer in progress.

Link: http://lkml.kernel.org/r/1469125134-16523-8-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/rapidio/devices/tsi721_dma.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/rapidio/devices/tsi721_dma.c b/drivers/rapidio/devices/tsi721_dma.c
index 13c669bac019..e2a418598129 100644
--- a/drivers/rapidio/devices/tsi721_dma.c
+++ b/drivers/rapidio/devices/tsi721_dma.c
@@ -726,6 +726,7 @@ static dma_cookie_t tsi721_tx_submit(struct dma_async_tx_descriptor *txd)
 	cookie = dma_cookie_assign(txd);
 	desc->status = DMA_IN_PROGRESS;
 	list_add_tail(&desc->desc_node, &bdma_chan->queue);
+	tsi721_advance_work(bdma_chan, NULL);
 
 	spin_unlock_bh(&bdma_chan->lock);
 	return cookie;

From 06e1b2497ca4783f5f9997b09c77d93aeea69ec1 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:49 -0700
Subject: [PATCH 096/111] rapidio: fix error handling in mbox request/release
 functions

Add checking for error code returned by HW-specific mbox open routines.
Ensure that resources are properly release if failed.

This patch is applicable to kernel versions starting from v2.6.15.

Link: http://lkml.kernel.org/r/1469125134-16523-9-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/rapidio/rio.c | 54 +++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/drivers/rapidio/rio.c b/drivers/rapidio/rio.c
index 840802943dc0..1cd32603259f 100644
--- a/drivers/rapidio/rio.c
+++ b/drivers/rapidio/rio.c
@@ -268,6 +268,12 @@ int rio_request_inb_mbox(struct rio_mport *mport,
 		mport->inb_msg[mbox].mcback = minb;
 
 		rc = mport->ops->open_inb_mbox(mport, dev_id, mbox, entries);
+		if (rc) {
+			mport->inb_msg[mbox].mcback = NULL;
+			mport->inb_msg[mbox].res = NULL;
+			release_resource(res);
+			kfree(res);
+		}
 	} else
 		rc = -ENOMEM;
 
@@ -285,13 +291,22 @@ int rio_request_inb_mbox(struct rio_mport *mport,
  */
 int rio_release_inb_mbox(struct rio_mport *mport, int mbox)
 {
-	if (mport->ops->close_inb_mbox) {
-		mport->ops->close_inb_mbox(mport, mbox);
+	int rc;
 
-		/* Release the mailbox resource */
-		return release_resource(mport->inb_msg[mbox].res);
-	} else
-		return -ENOSYS;
+	if (!mport->ops->close_inb_mbox || !mport->inb_msg[mbox].res)
+		return -EINVAL;
+
+	mport->ops->close_inb_mbox(mport, mbox);
+	mport->inb_msg[mbox].mcback = NULL;
+
+	rc = release_resource(mport->inb_msg[mbox].res);
+	if (rc)
+		return rc;
+
+	kfree(mport->inb_msg[mbox].res);
+	mport->inb_msg[mbox].res = NULL;
+
+	return 0;
 }
 
 /**
@@ -336,6 +351,12 @@ int rio_request_outb_mbox(struct rio_mport *mport,
 		mport->outb_msg[mbox].mcback = moutb;
 
 		rc = mport->ops->open_outb_mbox(mport, dev_id, mbox, entries);
+		if (rc) {
+			mport->outb_msg[mbox].mcback = NULL;
+			mport->outb_msg[mbox].res = NULL;
+			release_resource(res);
+			kfree(res);
+		}
 	} else
 		rc = -ENOMEM;
 
@@ -353,13 +374,22 @@ int rio_request_outb_mbox(struct rio_mport *mport,
  */
 int rio_release_outb_mbox(struct rio_mport *mport, int mbox)
 {
-	if (mport->ops->close_outb_mbox) {
-		mport->ops->close_outb_mbox(mport, mbox);
+	int rc;
 
-		/* Release the mailbox resource */
-		return release_resource(mport->outb_msg[mbox].res);
-	} else
-		return -ENOSYS;
+	if (!mport->ops->close_outb_mbox || !mport->outb_msg[mbox].res)
+		return -EINVAL;
+
+	mport->ops->close_outb_mbox(mport, mbox);
+	mport->outb_msg[mbox].mcback = NULL;
+
+	rc = release_resource(mport->outb_msg[mbox].res);
+	if (rc)
+		return rc;
+
+	kfree(mport->outb_msg[mbox].res);
+	mport->outb_msg[mbox].res = NULL;
+
+	return 0;
 }
 
 /**

From 60e377b5c1226d6737786947d0e915ab45d7f188 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:52 -0700
Subject: [PATCH 097/111] rapidio/idt_gen2: fix locking warning

Fix lockdep warning during device probing: move sysfs initialization out
of code protected by a spin lock.

Link: http://lkml.kernel.org/r/1469125134-16523-10-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/rapidio/switches/idt_gen2.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/rapidio/switches/idt_gen2.c b/drivers/rapidio/switches/idt_gen2.c
index 9f7fe21580bb..e67b923b1ca6 100644
--- a/drivers/rapidio/switches/idt_gen2.c
+++ b/drivers/rapidio/switches/idt_gen2.c
@@ -436,10 +436,11 @@ static int idtg2_probe(struct rio_dev *rdev, const struct rio_device_id *id)
 				    RIO_STD_RTE_DEFAULT_PORT, IDT_NO_ROUTE);
 	}
 
+	spin_unlock(&rdev->rswitch->lock);
+
 	/* Create device-specific sysfs attributes */
 	idtg2_sysfs(rdev, true);
 
-	spin_unlock(&rdev->rswitch->lock);
 	return 0;
 }
 
@@ -452,11 +453,9 @@ static void idtg2_remove(struct rio_dev *rdev)
 		return;
 	}
 	rdev->rswitch->ops = NULL;
-
+	spin_unlock(&rdev->rswitch->lock);
 	/* Remove device-specific sysfs attributes */
 	idtg2_sysfs(rdev, false);
-
-	spin_unlock(&rdev->rswitch->lock);
 }
 
 static struct rio_device_id idtg2_id_table[] = {

From a057a52e94e15d89be8af557584e0144a496b6c6 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:54 -0700
Subject: [PATCH 098/111] rapidio: change inbound window size type to u64

Current definition of map_inb() mport operations callback uses u32 type
to specify required inbound window (IBW) size.  This is limiting factor
because existing hardware - tsi721 and fsl_rio, both support IBW size up
to 16GB.

Changing type of size parameter to u64 to allow IBW size configurations
larger than 4GB.

[alexandre.bounine@idt.com: remove compiler warning about size of constant]
  Link: http://lkml.kernel.org/r/20160802184856.2566-1-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-11-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/powerpc/sysdev/fsl_rio.c    |  4 ++--
 drivers/rapidio/devices/tsi721.c | 14 +++++++++-----
 include/linux/rio.h              |  2 +-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_rio.c b/arch/powerpc/sysdev/fsl_rio.c
index f5bf38b94595..386790cfa16e 100644
--- a/arch/powerpc/sysdev/fsl_rio.c
+++ b/arch/powerpc/sysdev/fsl_rio.c
@@ -289,7 +289,7 @@ static void fsl_rio_inbound_mem_init(struct rio_priv *priv)
 }
 
 int fsl_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
-	u64 rstart, u32 size, u32 flags)
+	u64 rstart, u64 size, u32 flags)
 {
 	struct rio_priv *priv = mport->priv;
 	u32 base_size;
@@ -298,7 +298,7 @@ int fsl_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
 	u32 riwar;
 	int i;
 
-	if ((size & (size - 1)) != 0)
+	if ((size & (size - 1)) != 0 || size > 0x400000000ULL)
 		return -EINVAL;
 
 	base_size_log = ilog2(size);
diff --git a/drivers/rapidio/devices/tsi721.c b/drivers/rapidio/devices/tsi721.c
index 8e07cd56abdc..53daf634a1ac 100644
--- a/drivers/rapidio/devices/tsi721.c
+++ b/drivers/rapidio/devices/tsi721.c
@@ -1090,7 +1090,7 @@ static void tsi721_init_pc2sr_mapping(struct tsi721_device *priv)
  * from rstart to lstart.
  */
 static int tsi721_rio_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
-		u64 rstart, u32 size, u32 flags)
+		u64 rstart, u64 size, u32 flags)
 {
 	struct tsi721_device *priv = mport->priv;
 	int i, avail = -1;
@@ -1103,6 +1103,10 @@ static int tsi721_rio_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
 	struct tsi721_ib_win_mapping *map = NULL;
 	int ret = -EBUSY;
 
+	/* Max IBW size supported by HW is 16GB */
+	if (size > 0x400000000UL)
+		return -EINVAL;
+
 	if (direct) {
 		/* Calculate minimal acceptable window size and base address */
 
@@ -1110,15 +1114,15 @@ static int tsi721_rio_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
 		ibw_start = lstart & ~(ibw_size - 1);
 
 		tsi_debug(IBW, &priv->pdev->dev,
-			"Direct (RIO_0x%llx -> PCIe_%pad), size=0x%x, ibw_start = 0x%llx",
+			"Direct (RIO_0x%llx -> PCIe_%pad), size=0x%llx, ibw_start = 0x%llx",
 			rstart, &lstart, size, ibw_start);
 
 		while ((lstart + size) > (ibw_start + ibw_size)) {
 			ibw_size *= 2;
 			ibw_start = lstart & ~(ibw_size - 1);
-			if (ibw_size > 0x80000000) { /* Limit max size to 2GB */
+			/* Check for crossing IBW max size 16GB */
+			if (ibw_size > 0x400000000UL)
 				return -EBUSY;
-			}
 		}
 
 		loc_start = ibw_start;
@@ -1129,7 +1133,7 @@ static int tsi721_rio_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
 
 	} else {
 		tsi_debug(IBW, &priv->pdev->dev,
-			"Translated (RIO_0x%llx -> PCIe_%pad), size=0x%x",
+			"Translated (RIO_0x%llx -> PCIe_%pad), size=0x%llx",
 			rstart, &lstart, size);
 
 		if (!is_power_of_2(size) || size < 0x1000 ||
diff --git a/include/linux/rio.h b/include/linux/rio.h
index aa2323893e8d..f7ec35b48800 100644
--- a/include/linux/rio.h
+++ b/include/linux/rio.h
@@ -425,7 +425,7 @@ struct rio_ops {
 	int (*add_inb_buffer)(struct rio_mport *mport, int mbox, void *buf);
 	void *(*get_inb_message)(struct rio_mport *mport, int mbox);
 	int (*map_inb)(struct rio_mport *mport, dma_addr_t lstart,
-			u64 rstart, u32 size, u32 flags);
+			u64 rstart, u64 size, u32 flags);
 	void (*unmap_inb)(struct rio_mport *mport, dma_addr_t lstart);
 	int (*query_mport)(struct rio_mport *mport,
 			   struct rio_mport_attr *attr);

From 1ae842de1dd8051cbb65b396b6f029d07f992641 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:06:57 -0700
Subject: [PATCH 099/111] rapidio: modify for rev.3 specification changes

Implement changes made in RapidIO specification rev.3 to LP-Serial Physical
Layer register definitions:

 - use per-port register offset calculations based on LP-Serial Extended
   Features Block (EFB) Register Map type (I or II) with different
   per-port offset step (0x20 vs 0x40 respectfully).

 - remove deprecated Parallel Physical layer definitions and related
   code.

[alexandre.bounine@idt.com: fix DocBook warning for gen3 update]
  Link: http://lkml.kernel.org/r/1469191173-19338-1-git-send-email-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-12-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/rapidio/devices/rio_mport_cdev.c |   2 +-
 drivers/rapidio/devices/tsi721.c         |   8 +-
 drivers/rapidio/rio-scan.c               |  74 +++-------
 drivers/rapidio/rio.c                    | 150 ++++++++++----------
 drivers/rapidio/rio.h                    |   2 +-
 drivers/rapidio/switches/tsi57x.c        |  26 ++--
 include/linux/rio.h                      |  11 +-
 include/linux/rio_regs.h                 | 167 ++++++++++++++++++-----
 8 files changed, 249 insertions(+), 191 deletions(-)

diff --git a/drivers/rapidio/devices/rio_mport_cdev.c b/drivers/rapidio/devices/rio_mport_cdev.c
index de0c692587dd..436dfe871d32 100644
--- a/drivers/rapidio/devices/rio_mport_cdev.c
+++ b/drivers/rapidio/devices/rio_mport_cdev.c
@@ -1813,7 +1813,7 @@ static int rio_mport_add_riodev(struct mport_cdev_priv *priv,
 	if (rdev->pef & RIO_PEF_EXT_FEATURES) {
 		rdev->efptr = rval & 0xffff;
 		rdev->phys_efptr = rio_mport_get_physefb(mport, 0, destid,
-							 hopcount);
+						hopcount, &rdev->phys_rmap);
 
 		rdev->em_efptr = rio_mport_get_feature(mport, 0, destid,
 						hopcount, RIO_EFB_ERR_MGMNT);
diff --git a/drivers/rapidio/devices/tsi721.c b/drivers/rapidio/devices/tsi721.c
index 53daf634a1ac..32f0f014a067 100644
--- a/drivers/rapidio/devices/tsi721.c
+++ b/drivers/rapidio/devices/tsi721.c
@@ -2555,11 +2555,11 @@ static int tsi721_query_mport(struct rio_mport *mport,
 	struct tsi721_device *priv = mport->priv;
 	u32 rval;
 
-	rval = ioread32(priv->regs + (0x100 + RIO_PORT_N_ERR_STS_CSR(0)));
+	rval = ioread32(priv->regs + 0x100 + RIO_PORT_N_ERR_STS_CSR(0, 0));
 	if (rval & RIO_PORT_N_ERR_STS_PORT_OK) {
-		rval = ioread32(priv->regs + (0x100 + RIO_PORT_N_CTL2_CSR(0)));
+		rval = ioread32(priv->regs + 0x100 + RIO_PORT_N_CTL2_CSR(0, 0));
 		attr->link_speed = (rval & RIO_PORT_N_CTL2_SEL_BAUD) >> 28;
-		rval = ioread32(priv->regs + (0x100 + RIO_PORT_N_CTL_CSR(0)));
+		rval = ioread32(priv->regs + 0x100 + RIO_PORT_N_CTL_CSR(0, 0));
 		attr->link_width = (rval & RIO_PORT_N_CTL_IPW) >> 27;
 	} else
 		attr->link_speed = RIO_LINK_DOWN;
@@ -2673,9 +2673,9 @@ static int tsi721_setup_mport(struct tsi721_device *priv)
 	mport->ops = &tsi721_rio_ops;
 	mport->index = 0;
 	mport->sys_size = 0; /* small system */
-	mport->phy_type = RIO_PHY_SERIAL;
 	mport->priv = (void *)priv;
 	mport->phys_efptr = 0x100;
+	mport->phys_rmap = 1;
 	mport->dev.parent = &pdev->dev;
 	mport->dev.release = tsi721_mport_release;
 
diff --git a/drivers/rapidio/rio-scan.c b/drivers/rapidio/rio-scan.c
index a63a380809d1..23429bdaca84 100644
--- a/drivers/rapidio/rio-scan.c
+++ b/drivers/rapidio/rio-scan.c
@@ -49,15 +49,6 @@ struct rio_id_table {
 static int next_destid = 0;
 static int next_comptag = 1;
 
-static int rio_mport_phys_table[] = {
-	RIO_EFB_PAR_EP_ID,
-	RIO_EFB_PAR_EP_REC_ID,
-	RIO_EFB_SER_EP_ID,
-	RIO_EFB_SER_EP_REC_ID,
-	-1,
-};
-
-
 /**
  * rio_destid_alloc - Allocate next available destID for given network
  * @net: RIO network
@@ -380,10 +371,15 @@ static struct rio_dev *rio_setup_device(struct rio_net *net,
 	if (rdev->pef & RIO_PEF_EXT_FEATURES) {
 		rdev->efptr = result & 0xffff;
 		rdev->phys_efptr = rio_mport_get_physefb(port, 0, destid,
-							 hopcount);
+						hopcount, &rdev->phys_rmap);
+		pr_debug("RIO: %s Register Map %d device\n",
+			 __func__, rdev->phys_rmap);
 
 		rdev->em_efptr = rio_mport_get_feature(port, 0, destid,
 						hopcount, RIO_EFB_ERR_MGMNT);
+		if (!rdev->em_efptr)
+			rdev->em_efptr = rio_mport_get_feature(port, 0, destid,
+						hopcount, RIO_EFB_ERR_MGMNT_HS);
 	}
 
 	rio_mport_read_config_32(port, destid, hopcount, RIO_SRC_OPS_CAR,
@@ -445,7 +441,7 @@ static struct rio_dev *rio_setup_device(struct rio_net *net,
 			rio_route_clr_table(rdev, RIO_GLOBAL_TABLE, 0);
 	} else {
 		if (do_enum)
-			/*Enable Input Output Port (transmitter reviever)*/
+			/*Enable Input Output Port (transmitter receiver)*/
 			rio_enable_rx_tx_port(port, 0, destid, hopcount, 0);
 
 		dev_set_name(&rdev->dev, "%02x:e:%04x", rdev->net->id,
@@ -481,10 +477,8 @@ cleanup:
 
 /**
  * rio_sport_is_active- Tests if a switch port has an active connection.
- * @port: Master port to send transaction
- * @destid: Associated destination ID for switch
- * @hopcount: Hopcount to reach switch
- * @sport: Switch port number
+ * @rdev: RapidIO device object
+ * @sp: Switch port number
  *
  * Reads the port error status CSR for a particular switch port to
  * determine if the port has an active link.  Returns
@@ -492,31 +486,12 @@ cleanup:
  * inactive.
  */
 static int
-rio_sport_is_active(struct rio_mport *port, u16 destid, u8 hopcount, int sport)
+rio_sport_is_active(struct rio_dev *rdev, int sp)
 {
 	u32 result = 0;
-	u32 ext_ftr_ptr;
 
-	ext_ftr_ptr = rio_mport_get_efb(port, 0, destid, hopcount, 0);
-
-	while (ext_ftr_ptr) {
-		rio_mport_read_config_32(port, destid, hopcount,
-					 ext_ftr_ptr, &result);
-		result = RIO_GET_BLOCK_ID(result);
-		if ((result == RIO_EFB_SER_EP_FREE_ID) ||
-		    (result == RIO_EFB_SER_EP_FREE_ID_V13P) ||
-		    (result == RIO_EFB_SER_EP_FREC_ID))
-			break;
-
-		ext_ftr_ptr = rio_mport_get_efb(port, 0, destid, hopcount,
-						ext_ftr_ptr);
-	}
-
-	if (ext_ftr_ptr)
-		rio_mport_read_config_32(port, destid, hopcount,
-					 ext_ftr_ptr +
-					 RIO_PORT_N_ERR_STS_CSR(sport),
-					 &result);
+	rio_read_config_32(rdev, RIO_DEV_PORT_N_ERR_STS_CSR(rdev, sp),
+			   &result);
 
 	return result & RIO_PORT_N_ERR_STS_PORT_OK;
 }
@@ -655,9 +630,7 @@ static int rio_enum_peer(struct rio_net *net, struct rio_mport *port,
 
 			cur_destid = next_destid;
 
-			if (rio_sport_is_active
-			    (port, RIO_ANY_DESTID(port->sys_size), hopcount,
-			     port_num)) {
+			if (rio_sport_is_active(rdev, port_num)) {
 				pr_debug(
 				    "RIO: scanning device on port %d\n",
 				    port_num);
@@ -785,8 +758,7 @@ rio_disc_peer(struct rio_net *net, struct rio_mport *port, u16 destid,
 			if (RIO_GET_PORT_NUM(rdev->swpinfo) == port_num)
 				continue;
 
-			if (rio_sport_is_active
-			    (port, destid, hopcount, port_num)) {
+			if (rio_sport_is_active(rdev, port_num)) {
 				pr_debug(
 				    "RIO: scanning device on port %d\n",
 				    port_num);
@@ -831,21 +803,11 @@ rio_disc_peer(struct rio_net *net, struct rio_mport *port, u16 destid,
 static int rio_mport_is_active(struct rio_mport *port)
 {
 	u32 result = 0;
-	u32 ext_ftr_ptr;
-	int *entry = rio_mport_phys_table;
-
-	do {
-		if ((ext_ftr_ptr =
-		     rio_mport_get_feature(port, 1, 0, 0, *entry)))
-			break;
-	} while (*++entry >= 0);
-
-	if (ext_ftr_ptr)
-		rio_local_read_config_32(port,
-					 ext_ftr_ptr +
-					 RIO_PORT_N_ERR_STS_CSR(port->index),
-					 &result);
 
+	rio_local_read_config_32(port,
+		port->phys_efptr +
+			RIO_PORT_N_ERR_STS_CSR(port->index, port->phys_rmap),
+		&result);
 	return result & RIO_PORT_N_ERR_STS_PORT_OK;
 }
 
diff --git a/drivers/rapidio/rio.c b/drivers/rapidio/rio.c
index 1cd32603259f..37042858c2db 100644
--- a/drivers/rapidio/rio.c
+++ b/drivers/rapidio/rio.c
@@ -786,10 +786,11 @@ EXPORT_SYMBOL_GPL(rio_unmap_outb_region);
  * @local: Indicate a local master port or remote device access
  * @destid: Destination ID of the device
  * @hopcount: Number of switch hops to the device
+ * @rmap: pointer to location to store register map type info
  */
 u32
 rio_mport_get_physefb(struct rio_mport *port, int local,
-		      u16 destid, u8 hopcount)
+		      u16 destid, u8 hopcount, u32 *rmap)
 {
 	u32 ext_ftr_ptr;
 	u32 ftr_header;
@@ -807,14 +808,21 @@ rio_mport_get_physefb(struct rio_mport *port, int local,
 		ftr_header = RIO_GET_BLOCK_ID(ftr_header);
 		switch (ftr_header) {
 
-		case RIO_EFB_SER_EP_ID_V13P:
-		case RIO_EFB_SER_EP_REC_ID_V13P:
-		case RIO_EFB_SER_EP_FREE_ID_V13P:
 		case RIO_EFB_SER_EP_ID:
 		case RIO_EFB_SER_EP_REC_ID:
 		case RIO_EFB_SER_EP_FREE_ID:
-		case RIO_EFB_SER_EP_FREC_ID:
+		case RIO_EFB_SER_EP_M1_ID:
+		case RIO_EFB_SER_EP_SW_M1_ID:
+		case RIO_EFB_SER_EPF_M1_ID:
+		case RIO_EFB_SER_EPF_SW_M1_ID:
+			*rmap = 1;
+			return ext_ftr_ptr;
 
+		case RIO_EFB_SER_EP_M2_ID:
+		case RIO_EFB_SER_EP_SW_M2_ID:
+		case RIO_EFB_SER_EPF_M2_ID:
+		case RIO_EFB_SER_EPF_SW_M2_ID:
+			*rmap = 2;
 			return ext_ftr_ptr;
 
 		default:
@@ -873,16 +881,16 @@ int rio_set_port_lockout(struct rio_dev *rdev, u32 pnum, int lock)
 	u32 regval;
 
 	rio_read_config_32(rdev,
-				 rdev->phys_efptr + RIO_PORT_N_CTL_CSR(pnum),
-				 &regval);
+		RIO_DEV_PORT_N_CTL_CSR(rdev, pnum),
+		&regval);
 	if (lock)
 		regval |= RIO_PORT_N_CTL_LOCKOUT;
 	else
 		regval &= ~RIO_PORT_N_CTL_LOCKOUT;
 
 	rio_write_config_32(rdev,
-				  rdev->phys_efptr + RIO_PORT_N_CTL_CSR(pnum),
-				  regval);
+		RIO_DEV_PORT_N_CTL_CSR(rdev, pnum),
+		regval);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(rio_set_port_lockout);
@@ -906,6 +914,7 @@ int rio_enable_rx_tx_port(struct rio_mport *port,
 #ifdef CONFIG_RAPIDIO_ENABLE_RX_TX_PORTS
 	u32 regval;
 	u32 ext_ftr_ptr;
+	u32 rmap;
 
 	/*
 	* enable rx input tx output port
@@ -913,34 +922,29 @@ int rio_enable_rx_tx_port(struct rio_mport *port,
 	pr_debug("rio_enable_rx_tx_port(local = %d, destid = %d, hopcount = "
 		 "%d, port_num = %d)\n", local, destid, hopcount, port_num);
 
-	ext_ftr_ptr = rio_mport_get_physefb(port, local, destid, hopcount);
+	ext_ftr_ptr = rio_mport_get_physefb(port, local, destid,
+					    hopcount, &rmap);
 
 	if (local) {
-		rio_local_read_config_32(port, ext_ftr_ptr +
-				RIO_PORT_N_CTL_CSR(0),
+		rio_local_read_config_32(port,
+				ext_ftr_ptr + RIO_PORT_N_CTL_CSR(0, rmap),
 				&regval);
 	} else {
 		if (rio_mport_read_config_32(port, destid, hopcount,
-		ext_ftr_ptr + RIO_PORT_N_CTL_CSR(port_num), &regval) < 0)
+			ext_ftr_ptr + RIO_PORT_N_CTL_CSR(port_num, rmap),
+				&regval) < 0)
 			return -EIO;
 	}
 
-	if (regval & RIO_PORT_N_CTL_P_TYP_SER) {
-		/* serial */
-		regval = regval | RIO_PORT_N_CTL_EN_RX_SER
-				| RIO_PORT_N_CTL_EN_TX_SER;
-	} else {
-		/* parallel */
-		regval = regval | RIO_PORT_N_CTL_EN_RX_PAR
-				| RIO_PORT_N_CTL_EN_TX_PAR;
-	}
+	regval = regval | RIO_PORT_N_CTL_EN_RX | RIO_PORT_N_CTL_EN_TX;
 
 	if (local) {
-		rio_local_write_config_32(port, ext_ftr_ptr +
-					  RIO_PORT_N_CTL_CSR(0), regval);
+		rio_local_write_config_32(port,
+			ext_ftr_ptr + RIO_PORT_N_CTL_CSR(0, rmap), regval);
 	} else {
 		if (rio_mport_write_config_32(port, destid, hopcount,
-		    ext_ftr_ptr + RIO_PORT_N_CTL_CSR(port_num), regval) < 0)
+			ext_ftr_ptr + RIO_PORT_N_CTL_CSR(port_num, rmap),
+				regval) < 0)
 			return -EIO;
 	}
 #endif
@@ -1042,14 +1046,14 @@ rio_get_input_status(struct rio_dev *rdev, int pnum, u32 *lnkresp)
 		/* Read from link maintenance response register
 		 * to clear valid bit */
 		rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_MNT_RSP_CSR(pnum),
+			RIO_DEV_PORT_N_MNT_RSP_CSR(rdev, pnum),
 			&regval);
 		udelay(50);
 	}
 
 	/* Issue Input-status command */
 	rio_write_config_32(rdev,
-		rdev->phys_efptr + RIO_PORT_N_MNT_REQ_CSR(pnum),
+		RIO_DEV_PORT_N_MNT_REQ_CSR(rdev, pnum),
 		RIO_MNT_REQ_CMD_IS);
 
 	/* Exit if the response is not expected */
@@ -1060,7 +1064,7 @@ rio_get_input_status(struct rio_dev *rdev, int pnum, u32 *lnkresp)
 	while (checkcount--) {
 		udelay(50);
 		rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_MNT_RSP_CSR(pnum),
+			RIO_DEV_PORT_N_MNT_RSP_CSR(rdev, pnum),
 			&regval);
 		if (regval & RIO_PORT_N_MNT_RSP_RVAL) {
 			*lnkresp = regval;
@@ -1076,6 +1080,13 @@ rio_get_input_status(struct rio_dev *rdev, int pnum, u32 *lnkresp)
  * @rdev: Pointer to RIO device control structure
  * @pnum: Switch port number to clear errors
  * @err_status: port error status (if 0 reads register from device)
+ *
+ * TODO: Currently this routine is not compatible with recovery process
+ * specified for idt_gen3 RapidIO switch devices. It has to be reviewed
+ * to implement universal recovery process that is compatible full range
+ * off available devices.
+ * IDT gen3 switch driver now implements HW-specific error handler that
+ * issues soft port reset to the port to reset ERR_STOP bits and ackIDs.
  */
 static int rio_clr_err_stopped(struct rio_dev *rdev, u32 pnum, u32 err_status)
 {
@@ -1085,10 +1096,10 @@ static int rio_clr_err_stopped(struct rio_dev *rdev, u32 pnum, u32 err_status)
 
 	if (err_status == 0)
 		rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_ERR_STS_CSR(pnum),
+			RIO_DEV_PORT_N_ERR_STS_CSR(rdev, pnum),
 			&err_status);
 
-	if (err_status & RIO_PORT_N_ERR_STS_PW_OUT_ES) {
+	if (err_status & RIO_PORT_N_ERR_STS_OUT_ES) {
 		pr_debug("RIO_EM: servicing Output Error-Stopped state\n");
 		/*
 		 * Send a Link-Request/Input-Status control symbol
@@ -1103,7 +1114,7 @@ static int rio_clr_err_stopped(struct rio_dev *rdev, u32 pnum, u32 err_status)
 		far_ackid = (regval & RIO_PORT_N_MNT_RSP_ASTAT) >> 5;
 		far_linkstat = regval & RIO_PORT_N_MNT_RSP_LSTAT;
 		rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_ACK_STS_CSR(pnum),
+			RIO_DEV_PORT_N_ACK_STS_CSR(rdev, pnum),
 			&regval);
 		pr_debug("RIO_EM: SP%d_ACK_STS_CSR=0x%08x\n", pnum, regval);
 		near_ackid = (regval & RIO_PORT_N_ACK_INBOUND) >> 24;
@@ -1121,43 +1132,43 @@ static int rio_clr_err_stopped(struct rio_dev *rdev, u32 pnum, u32 err_status)
 			 * far inbound.
 			 */
 			rio_write_config_32(rdev,
-				rdev->phys_efptr + RIO_PORT_N_ACK_STS_CSR(pnum),
+				RIO_DEV_PORT_N_ACK_STS_CSR(rdev, pnum),
 				(near_ackid << 24) |
 					(far_ackid << 8) | far_ackid);
 			/* Align far outstanding/outbound ackIDs with
 			 * near inbound.
 			 */
 			far_ackid++;
-			if (nextdev)
-				rio_write_config_32(nextdev,
-					nextdev->phys_efptr +
-					RIO_PORT_N_ACK_STS_CSR(RIO_GET_PORT_NUM(nextdev->swpinfo)),
-					(far_ackid << 24) |
-					(near_ackid << 8) | near_ackid);
-			else
-				pr_debug("RIO_EM: Invalid nextdev pointer (NULL)\n");
+			if (!nextdev) {
+				pr_debug("RIO_EM: nextdev pointer == NULL\n");
+				goto rd_err;
+			}
+
+			rio_write_config_32(nextdev,
+				RIO_DEV_PORT_N_ACK_STS_CSR(nextdev,
+					RIO_GET_PORT_NUM(nextdev->swpinfo)),
+				(far_ackid << 24) |
+				(near_ackid << 8) | near_ackid);
 		}
 rd_err:
-		rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_ERR_STS_CSR(pnum),
-			&err_status);
+		rio_read_config_32(rdev, RIO_DEV_PORT_N_ERR_STS_CSR(rdev, pnum),
+				   &err_status);
 		pr_debug("RIO_EM: SP%d_ERR_STS_CSR=0x%08x\n", pnum, err_status);
 	}
 
-	if ((err_status & RIO_PORT_N_ERR_STS_PW_INP_ES) && nextdev) {
+	if ((err_status & RIO_PORT_N_ERR_STS_INP_ES) && nextdev) {
 		pr_debug("RIO_EM: servicing Input Error-Stopped state\n");
 		rio_get_input_status(nextdev,
 				     RIO_GET_PORT_NUM(nextdev->swpinfo), NULL);
 		udelay(50);
 
-		rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_ERR_STS_CSR(pnum),
-			&err_status);
+		rio_read_config_32(rdev, RIO_DEV_PORT_N_ERR_STS_CSR(rdev, pnum),
+				   &err_status);
 		pr_debug("RIO_EM: SP%d_ERR_STS_CSR=0x%08x\n", pnum, err_status);
 	}
 
-	return (err_status & (RIO_PORT_N_ERR_STS_PW_OUT_ES |
-			      RIO_PORT_N_ERR_STS_PW_INP_ES)) ? 1 : 0;
+	return (err_status & (RIO_PORT_N_ERR_STS_OUT_ES |
+			      RIO_PORT_N_ERR_STS_INP_ES)) ? 1 : 0;
 }
 
 /**
@@ -1257,9 +1268,8 @@ int rio_inb_pwrite_handler(struct rio_mport *mport, union rio_pw_msg *pw_msg)
 	if (rdev->rswitch->ops && rdev->rswitch->ops->em_handle)
 		rdev->rswitch->ops->em_handle(rdev, portnum);
 
-	rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_ERR_STS_CSR(portnum),
-			&err_status);
+	rio_read_config_32(rdev, RIO_DEV_PORT_N_ERR_STS_CSR(rdev, portnum),
+			   &err_status);
 	pr_debug("RIO_PW: SP%d_ERR_STS_CSR=0x%08x\n", portnum, err_status);
 
 	if (err_status & RIO_PORT_N_ERR_STS_PORT_OK) {
@@ -1276,8 +1286,8 @@ int rio_inb_pwrite_handler(struct rio_mport *mport, union rio_pw_msg *pw_msg)
 		 * Depending on the link partner state, two attempts
 		 * may be needed for successful recovery.
 		 */
-		if (err_status & (RIO_PORT_N_ERR_STS_PW_OUT_ES |
-				  RIO_PORT_N_ERR_STS_PW_INP_ES)) {
+		if (err_status & (RIO_PORT_N_ERR_STS_OUT_ES |
+				  RIO_PORT_N_ERR_STS_INP_ES)) {
 			if (rio_clr_err_stopped(rdev, portnum, err_status))
 				rio_clr_err_stopped(rdev, portnum, 0);
 		}
@@ -1287,10 +1297,18 @@ int rio_inb_pwrite_handler(struct rio_mport *mport, union rio_pw_msg *pw_msg)
 			rdev->rswitch->port_ok &= ~(1 << portnum);
 			rio_set_port_lockout(rdev, portnum, 1);
 
+			if (rdev->phys_rmap == 1) {
 			rio_write_config_32(rdev,
-				rdev->phys_efptr +
-					RIO_PORT_N_ACK_STS_CSR(portnum),
+				RIO_DEV_PORT_N_ACK_STS_CSR(rdev, portnum),
 				RIO_PORT_N_ACK_CLEAR);
+			} else {
+				rio_write_config_32(rdev,
+					RIO_DEV_PORT_N_OB_ACK_CSR(rdev, portnum),
+					RIO_PORT_N_OB_ACK_CLEAR);
+				rio_write_config_32(rdev,
+					RIO_DEV_PORT_N_IB_ACK_CSR(rdev, portnum),
+					0);
+			}
 
 			/* Schedule Extraction Service */
 			pr_debug("RIO_PW: Device Extraction on [%s]-P%d\n",
@@ -1319,9 +1337,8 @@ int rio_inb_pwrite_handler(struct rio_mport *mport, union rio_pw_msg *pw_msg)
 	}
 
 	/* Clear remaining error bits and Port-Write Pending bit */
-	rio_write_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_ERR_STS_CSR(portnum),
-			err_status);
+	rio_write_config_32(rdev, RIO_DEV_PORT_N_ERR_STS_CSR(rdev, portnum),
+			    err_status);
 
 	return 0;
 }
@@ -1372,20 +1389,7 @@ EXPORT_SYMBOL_GPL(rio_mport_get_efb);
  * Tell if a device supports a given RapidIO capability.
  * Returns the offset of the requested extended feature
  * block within the device's RIO configuration space or
- * 0 in case the device does not support it.  Possible
- * values for @ftr:
- *
- * %RIO_EFB_PAR_EP_ID		LP/LVDS EP Devices
- *
- * %RIO_EFB_PAR_EP_REC_ID	LP/LVDS EP Recovery Devices
- *
- * %RIO_EFB_PAR_EP_FREE_ID	LP/LVDS EP Free Devices
- *
- * %RIO_EFB_SER_EP_ID		LP/Serial EP Devices
- *
- * %RIO_EFB_SER_EP_REC_ID	LP/Serial EP Recovery Devices
- *
- * %RIO_EFB_SER_EP_FREE_ID	LP/Serial EP Free Devices
+ * 0 in case the device does not support it.
  */
 u32
 rio_mport_get_feature(struct rio_mport * port, int local, u16 destid,
diff --git a/drivers/rapidio/rio.h b/drivers/rapidio/rio.h
index 625d09add001..9796b3fee70d 100644
--- a/drivers/rapidio/rio.h
+++ b/drivers/rapidio/rio.h
@@ -22,7 +22,7 @@
 extern u32 rio_mport_get_feature(struct rio_mport *mport, int local, u16 destid,
 				 u8 hopcount, int ftr);
 extern u32 rio_mport_get_physefb(struct rio_mport *port, int local,
-				 u16 destid, u8 hopcount);
+				 u16 destid, u8 hopcount, u32 *rmap);
 extern u32 rio_mport_get_efb(struct rio_mport *port, int local, u16 destid,
 			     u8 hopcount, u32 from);
 extern int rio_mport_chk_dev_access(struct rio_mport *mport, u16 destid,
diff --git a/drivers/rapidio/switches/tsi57x.c b/drivers/rapidio/switches/tsi57x.c
index 42c8b014fe15..2700d15f7584 100644
--- a/drivers/rapidio/switches/tsi57x.c
+++ b/drivers/rapidio/switches/tsi57x.c
@@ -175,12 +175,10 @@ tsi57x_em_init(struct rio_dev *rdev)
 
 		/* Clear all pending interrupts */
 		rio_read_config_32(rdev,
-				rdev->phys_efptr +
-					RIO_PORT_N_ERR_STS_CSR(portnum),
+				RIO_DEV_PORT_N_ERR_STS_CSR(rdev, portnum),
 				&regval);
 		rio_write_config_32(rdev,
-				rdev->phys_efptr +
-					RIO_PORT_N_ERR_STS_CSR(portnum),
+				RIO_DEV_PORT_N_ERR_STS_CSR(rdev, portnum),
 				regval & 0x07120214);
 
 		rio_read_config_32(rdev,
@@ -198,7 +196,7 @@ tsi57x_em_init(struct rio_dev *rdev)
 
 		/* Skip next (odd) port if the current port is in x4 mode */
 		rio_read_config_32(rdev,
-				rdev->phys_efptr + RIO_PORT_N_CTL_CSR(portnum),
+				RIO_DEV_PORT_N_CTL_CSR(rdev, portnum),
 				&regval);
 		if ((regval & RIO_PORT_N_CTL_PWIDTH) == RIO_PORT_N_CTL_PWIDTH_4)
 			portnum++;
@@ -221,23 +219,23 @@ tsi57x_em_handler(struct rio_dev *rdev, u8 portnum)
 	u32 regval;
 
 	rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_ERR_STS_CSR(portnum),
+			RIO_DEV_PORT_N_ERR_STS_CSR(rdev, portnum),
 			&err_status);
 
 	if ((err_status & RIO_PORT_N_ERR_STS_PORT_OK) &&
-	    (err_status & (RIO_PORT_N_ERR_STS_PW_OUT_ES |
-			  RIO_PORT_N_ERR_STS_PW_INP_ES))) {
+	    (err_status & (RIO_PORT_N_ERR_STS_OUT_ES |
+			  RIO_PORT_N_ERR_STS_INP_ES))) {
 		/* Remove any queued packets by locking/unlocking port */
 		rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_CTL_CSR(portnum),
+			RIO_DEV_PORT_N_CTL_CSR(rdev, portnum),
 			&regval);
 		if (!(regval & RIO_PORT_N_CTL_LOCKOUT)) {
 			rio_write_config_32(rdev,
-				rdev->phys_efptr + RIO_PORT_N_CTL_CSR(portnum),
+				RIO_DEV_PORT_N_CTL_CSR(rdev, portnum),
 				regval | RIO_PORT_N_CTL_LOCKOUT);
 			udelay(50);
 			rio_write_config_32(rdev,
-				rdev->phys_efptr + RIO_PORT_N_CTL_CSR(portnum),
+				RIO_DEV_PORT_N_CTL_CSR(rdev, portnum),
 				regval);
 		}
 
@@ -245,7 +243,7 @@ tsi57x_em_handler(struct rio_dev *rdev, u8 portnum)
 		 * valid bit
 		 */
 		rio_read_config_32(rdev,
-			rdev->phys_efptr + RIO_PORT_N_MNT_RSP_CSR(portnum),
+			RIO_DEV_PORT_N_MNT_RSP_CSR(rdev, portnum),
 			&regval);
 
 		/* Send a Packet-Not-Accepted/Link-Request-Input-Status control
@@ -259,8 +257,8 @@ tsi57x_em_handler(struct rio_dev *rdev, u8 portnum)
 			while (checkcount--) {
 				udelay(50);
 				rio_read_config_32(rdev,
-					rdev->phys_efptr +
-						RIO_PORT_N_MNT_RSP_CSR(portnum),
+					RIO_DEV_PORT_N_MNT_RSP_CSR(rdev,
+								   portnum),
 					&regval);
 				if (regval & RIO_PORT_N_MNT_RSP_RVAL)
 					goto exit_es;
diff --git a/include/linux/rio.h b/include/linux/rio.h
index f7ec35b48800..37b95c4af99d 100644
--- a/include/linux/rio.h
+++ b/include/linux/rio.h
@@ -163,6 +163,7 @@ enum rio_device_state {
  * @dst_ops: Destination operation capabilities
  * @comp_tag: RIO component tag
  * @phys_efptr: RIO device extended features pointer
+ * @phys_rmap: LP-Serial Register Map Type (1 or 2)
  * @em_efptr: RIO Error Management features pointer
  * @dma_mask: Mask of bits of RIO address this device implements
  * @driver: Driver claiming this device
@@ -193,6 +194,7 @@ struct rio_dev {
 	u32 dst_ops;
 	u32 comp_tag;
 	u32 phys_efptr;
+	u32 phys_rmap;
 	u32 em_efptr;
 	u64 dma_mask;
 	struct rio_driver *driver;	/* RIO driver claiming this device */
@@ -237,11 +239,6 @@ struct rio_dbell {
 	void *dev_id;
 };
 
-enum rio_phy_type {
-	RIO_PHY_PARALLEL,
-	RIO_PHY_SERIAL,
-};
-
 /**
  * struct rio_mport - RIO master port info
  * @dbells: List of doorbell events
@@ -259,8 +256,8 @@ enum rio_phy_type {
  * @id: Port ID, unique among all ports
  * @index: Port index, unique among all port interfaces of the same type
  * @sys_size: RapidIO common transport system size
- * @phy_type: RapidIO phy type
  * @phys_efptr: RIO port extended features pointer
+ * @phys_rmap: LP-Serial EFB Register Mapping type (1 or 2).
  * @name: Port name string
  * @dev: device structure associated with an mport
  * @priv: Master port private data
@@ -289,8 +286,8 @@ struct rio_mport {
 				 * 0 - Small size. 256 devices.
 				 * 1 - Large size, 65536 devices.
 				 */
-	enum rio_phy_type phy_type;	/* RapidIO phy type */
 	u32 phys_efptr;
+	u32 phys_rmap;
 	unsigned char name[RIO_MAX_MPORT_NAME];
 	struct device dev;
 	void *priv;		/* Master port private data */
diff --git a/include/linux/rio_regs.h b/include/linux/rio_regs.h
index 1063ae382bc2..40c04efe7409 100644
--- a/include/linux/rio_regs.h
+++ b/include/linux/rio_regs.h
@@ -42,9 +42,11 @@
 #define  RIO_PEF_INB_MBOX2		0x00200000	/* [II, <= 1.2] Mailbox 2 */
 #define  RIO_PEF_INB_MBOX3		0x00100000	/* [II, <= 1.2] Mailbox 3 */
 #define  RIO_PEF_INB_DOORBELL		0x00080000	/* [II, <= 1.2] Doorbells */
+#define  RIO_PEF_DEV32			0x00001000	/* [III] PE supports Common TRansport Dev32 */
 #define  RIO_PEF_EXT_RT			0x00000200	/* [III, 1.3] Extended route table support */
 #define  RIO_PEF_STD_RT			0x00000100	/* [III, 1.3] Standard route table support */
-#define  RIO_PEF_CTLS			0x00000010	/* [III] CTLS */
+#define  RIO_PEF_CTLS			0x00000010	/* [III] Common Transport Large System (< rev.3) */
+#define  RIO_PEF_DEV16			0x00000010	/* [III] PE Supports Common Transport Dev16 (rev.3) */
 #define  RIO_PEF_EXT_FEATURES		0x00000008	/* [I] EFT_PTR valid */
 #define  RIO_PEF_ADDR_66		0x00000004	/* [I] 66 bits */
 #define  RIO_PEF_ADDR_50		0x00000002	/* [I] 50 bits */
@@ -194,70 +196,101 @@
 #define RIO_GET_BLOCK_ID(x)	(x & RIO_EFB_ID_MASK)
 
 /* Extended Feature Block IDs */
-#define RIO_EFB_PAR_EP_ID	0x0001	/* [IV] LP/LVDS EP Devices */
-#define RIO_EFB_PAR_EP_REC_ID	0x0002	/* [IV] LP/LVDS EP Recovery Devices */
-#define RIO_EFB_PAR_EP_FREE_ID	0x0003	/* [IV] LP/LVDS EP Free Devices */
-#define RIO_EFB_SER_EP_ID_V13P	0x0001	/* [VI] LP/Serial EP Devices, RapidIO Spec ver 1.3 and above */
-#define RIO_EFB_SER_EP_REC_ID_V13P	0x0002	/* [VI] LP/Serial EP Recovery Devices, RapidIO Spec ver 1.3 and above */
-#define RIO_EFB_SER_EP_FREE_ID_V13P	0x0003	/* [VI] LP/Serial EP Free Devices, RapidIO Spec ver 1.3 and above */
-#define RIO_EFB_SER_EP_ID	0x0004	/* [VI] LP/Serial EP Devices */
-#define RIO_EFB_SER_EP_REC_ID	0x0005	/* [VI] LP/Serial EP Recovery Devices */
-#define RIO_EFB_SER_EP_FREE_ID	0x0006	/* [VI] LP/Serial EP Free Devices */
-#define RIO_EFB_SER_EP_FREC_ID	0x0009  /* [VI] LP/Serial EP Free Recovery Devices */
+#define RIO_EFB_SER_EP_M1_ID	0x0001	/* [VI] LP-Serial EP Devices, Map I */
+#define RIO_EFB_SER_EP_SW_M1_ID	0x0002	/* [VI] LP-Serial EP w SW Recovery Devices, Map I */
+#define RIO_EFB_SER_EPF_M1_ID	0x0003	/* [VI] LP-Serial EP Free Devices, Map I */
+#define RIO_EFB_SER_EP_ID	0x0004	/* [VI] LP-Serial EP Devices, RIO 1.2 */
+#define RIO_EFB_SER_EP_REC_ID	0x0005	/* [VI] LP-Serial EP w SW Recovery Devices, RIO 1.2 */
+#define RIO_EFB_SER_EP_FREE_ID	0x0006	/* [VI] LP-Serial EP Free Devices, RIO 1.2 */
 #define RIO_EFB_ERR_MGMNT	0x0007  /* [VIII] Error Management Extensions */
+#define RIO_EFB_SER_EPF_SW_M1_ID	0x0009  /* [VI] LP-Serial EP Free w SW Recovery Devices, Map I */
+#define RIO_EFB_SW_ROUTING_TBL	0x000E  /* [III] Switch Routing Table Block */
+#define RIO_EFB_SER_EP_M2_ID	0x0011	/* [VI] LP-Serial EP Devices, Map II */
+#define RIO_EFB_SER_EP_SW_M2_ID	0x0012	/* [VI] LP-Serial EP w SW Recovery Devices, Map II */
+#define RIO_EFB_SER_EPF_M2_ID	0x0013	/* [VI] LP-Serial EP Free Devices, Map II */
+#define RIO_EFB_ERR_MGMNT_HS	0x0017  /* [VIII] Error Management Extensions, Hot-Swap only */
+#define RIO_EFB_SER_EPF_SW_M2_ID	0x0019  /* [VI] LP-Serial EP Free w SW Recovery Devices, Map II */
 
 /*
- * Physical 8/16 LP-LVDS
- * ID=0x0001, Generic End Point Devices
- * ID=0x0002, Generic End Point Devices, software assisted recovery option
- * ID=0x0003, Generic End Point Free Devices
- *
- * Physical LP-Serial
- * ID=0x0004, Generic End Point Devices
- * ID=0x0005, Generic End Point Devices, software assisted recovery option
- * ID=0x0006, Generic End Point Free Devices
+ * Physical LP-Serial Registers Definitions
+ * Parameters in register macros:
+ *    n - port number, m - Register Map Type (1 or 2)
  */
 #define RIO_PORT_MNT_HEADER		0x0000
 #define RIO_PORT_REQ_CTL_CSR		0x0020
-#define RIO_PORT_RSP_CTL_CSR		0x0024	/* 0x0001/0x0002 */
-#define RIO_PORT_LINKTO_CTL_CSR		0x0020	/* Serial */
-#define RIO_PORT_RSPTO_CTL_CSR		0x0024	/* Serial */
+#define RIO_PORT_RSP_CTL_CSR		0x0024
+#define RIO_PORT_LINKTO_CTL_CSR		0x0020
+#define RIO_PORT_RSPTO_CTL_CSR		0x0024
 #define RIO_PORT_GEN_CTL_CSR		0x003c
 #define  RIO_PORT_GEN_HOST		0x80000000
 #define  RIO_PORT_GEN_MASTER		0x40000000
 #define  RIO_PORT_GEN_DISCOVERED	0x20000000
-#define RIO_PORT_N_MNT_REQ_CSR(x)	(0x0040 + x*0x20)	/* 0x0002 */
+#define RIO_PORT_N_MNT_REQ_CSR(n, m)	(0x40 + (n) * (0x20 * (m)))
 #define  RIO_MNT_REQ_CMD_RD		0x03	/* Reset-device command */
 #define  RIO_MNT_REQ_CMD_IS		0x04	/* Input-status command */
-#define RIO_PORT_N_MNT_RSP_CSR(x)	(0x0044 + x*0x20)	/* 0x0002 */
+#define RIO_PORT_N_MNT_RSP_CSR(n, m)	(0x44 + (n) * (0x20 * (m)))
 #define  RIO_PORT_N_MNT_RSP_RVAL	0x80000000 /* Response Valid */
 #define  RIO_PORT_N_MNT_RSP_ASTAT	0x000007e0 /* ackID Status */
 #define  RIO_PORT_N_MNT_RSP_LSTAT	0x0000001f /* Link Status */
-#define RIO_PORT_N_ACK_STS_CSR(x)	(0x0048 + x*0x20)	/* 0x0002 */
+#define RIO_PORT_N_ACK_STS_CSR(n)	(0x48 + (n) * 0x20) /* Only in RM-I */
 #define  RIO_PORT_N_ACK_CLEAR		0x80000000
 #define  RIO_PORT_N_ACK_INBOUND		0x3f000000
 #define  RIO_PORT_N_ACK_OUTSTAND	0x00003f00
 #define  RIO_PORT_N_ACK_OUTBOUND	0x0000003f
-#define RIO_PORT_N_CTL2_CSR(x)		(0x0054 + x*0x20)
+#define RIO_PORT_N_CTL2_CSR(n, m)	(0x54 + (n) * (0x20 * (m)))
 #define  RIO_PORT_N_CTL2_SEL_BAUD	0xf0000000
-#define RIO_PORT_N_ERR_STS_CSR(x)	(0x0058 + x*0x20)
-#define  RIO_PORT_N_ERR_STS_PW_OUT_ES	0x00010000 /* Output Error-stopped */
-#define  RIO_PORT_N_ERR_STS_PW_INP_ES	0x00000100 /* Input Error-stopped */
+#define RIO_PORT_N_ERR_STS_CSR(n, m)	(0x58 + (n) * (0x20 * (m)))
+#define  RIO_PORT_N_ERR_STS_OUT_ES	0x00010000 /* Output Error-stopped */
+#define  RIO_PORT_N_ERR_STS_INP_ES	0x00000100 /* Input Error-stopped */
 #define  RIO_PORT_N_ERR_STS_PW_PEND	0x00000010 /* Port-Write Pending */
+#define  RIO_PORT_N_ERR_STS_PORT_UA	0x00000008 /* Port Unavailable */
 #define  RIO_PORT_N_ERR_STS_PORT_ERR	0x00000004
 #define  RIO_PORT_N_ERR_STS_PORT_OK	0x00000002
 #define  RIO_PORT_N_ERR_STS_PORT_UNINIT	0x00000001
-#define RIO_PORT_N_CTL_CSR(x)		(0x005c + x*0x20)
+#define RIO_PORT_N_CTL_CSR(n, m)	(0x5c + (n) * (0x20 * (m)))
 #define  RIO_PORT_N_CTL_PWIDTH		0xc0000000
 #define  RIO_PORT_N_CTL_PWIDTH_1	0x00000000
 #define  RIO_PORT_N_CTL_PWIDTH_4	0x40000000
 #define  RIO_PORT_N_CTL_IPW		0x38000000 /* Initialized Port Width */
 #define  RIO_PORT_N_CTL_P_TYP_SER	0x00000001
 #define  RIO_PORT_N_CTL_LOCKOUT		0x00000002
-#define  RIO_PORT_N_CTL_EN_RX_SER	0x00200000
-#define  RIO_PORT_N_CTL_EN_TX_SER	0x00400000
-#define  RIO_PORT_N_CTL_EN_RX_PAR	0x08000000
-#define  RIO_PORT_N_CTL_EN_TX_PAR	0x40000000
+#define  RIO_PORT_N_CTL_EN_RX		0x00200000
+#define  RIO_PORT_N_CTL_EN_TX		0x00400000
+#define RIO_PORT_N_OB_ACK_CSR(n)	(0x60 + (n) * 0x40) /* Only in RM-II */
+#define  RIO_PORT_N_OB_ACK_CLEAR	0x80000000
+#define  RIO_PORT_N_OB_ACK_OUTSTD	0x00fff000
+#define  RIO_PORT_N_OB_ACK_OUTBND	0x00000fff
+#define RIO_PORT_N_IB_ACK_CSR(n)	(0x64 + (n) * 0x40) /* Only in RM-II */
+#define  RIO_PORT_N_IB_ACK_INBND	0x00000fff
+
+/*
+ * Device-based helper macros for serial port register access.
+ *   d - pointer to rapidio device object, n - port number
+ */
+
+#define RIO_DEV_PORT_N_MNT_REQ_CSR(d, n)	\
+		(d->phys_efptr + RIO_PORT_N_MNT_REQ_CSR(n, d->phys_rmap))
+
+#define RIO_DEV_PORT_N_MNT_RSP_CSR(d, n)	\
+		(d->phys_efptr + RIO_PORT_N_MNT_RSP_CSR(n, d->phys_rmap))
+
+#define RIO_DEV_PORT_N_ACK_STS_CSR(d, n)	\
+		(d->phys_efptr + RIO_PORT_N_ACK_STS_CSR(n))
+
+#define RIO_DEV_PORT_N_CTL2_CSR(d, n)		\
+		(d->phys_efptr + RIO_PORT_N_CTL2_CSR(n, d->phys_rmap))
+
+#define RIO_DEV_PORT_N_ERR_STS_CSR(d, n)	\
+		(d->phys_efptr + RIO_PORT_N_ERR_STS_CSR(n, d->phys_rmap))
+
+#define RIO_DEV_PORT_N_CTL_CSR(d, n)		\
+		(d->phys_efptr + RIO_PORT_N_CTL_CSR(n, d->phys_rmap))
+
+#define RIO_DEV_PORT_N_OB_ACK_CSR(d, n)		\
+		(d->phys_efptr + RIO_PORT_N_OB_ACK_CSR(n))
+
+#define RIO_DEV_PORT_N_IB_ACK_CSR(d, n)		\
+		(d->phys_efptr + RIO_PORT_N_IB_ACK_CSR(n))
 
 /*
  * Error Management Extensions (RapidIO 1.3+, Part 8)
@@ -268,6 +301,7 @@
 /* General EM Registers (Common for all Ports) */
 
 #define RIO_EM_EFB_HEADER	0x000	/* Error Management Extensions Block Header */
+#define RIO_EM_EMHS_CAR		0x004	/* EM Functionality CAR */
 #define RIO_EM_LTL_ERR_DETECT	0x008	/* Logical/Transport Layer Error Detect CSR */
 #define RIO_EM_LTL_ERR_EN	0x00c	/* Logical/Transport Layer Error Enable CSR */
 #define  REM_LTL_ERR_ILLTRAN		0x08000000 /* Illegal Transaction decode */
@@ -278,15 +312,33 @@
 #define RIO_EM_LTL_ADDR_CAP	0x014	/* Logical/Transport Layer Address Capture CSR */
 #define RIO_EM_LTL_DEVID_CAP	0x018	/* Logical/Transport Layer Device ID Capture CSR */
 #define RIO_EM_LTL_CTRL_CAP	0x01c	/* Logical/Transport Layer Control Capture CSR */
+#define RIO_EM_LTL_DID32_CAP	0x020	/* Logical/Transport Layer Dev32 DestID Capture CSR */
+#define RIO_EM_LTL_SID32_CAP	0x024	/* Logical/Transport Layer Dev32  source ID Capture CSR */
 #define RIO_EM_PW_TGT_DEVID	0x028	/* Port-write Target deviceID CSR */
+#define  RIO_EM_PW_TGT_DEVID_D16M	0xff000000	/* Port-write Target DID16 MSB */
+#define  RIO_EM_PW_TGT_DEVID_D8		0x00ff0000	/* Port-write Target DID16 LSB or DID8 */
+#define  RIO_EM_PW_TGT_DEVID_DEV16	0x00008000	/* Port-write Target DID16 LSB or DID8 */
+#define  RIO_EM_PW_TGT_DEVID_DEV32	0x00004000	/* Port-write Target DID16 LSB or DID8 */
 #define RIO_EM_PKT_TTL		0x02c	/* Packet Time-to-live CSR */
+#define RIO_EM_PKT_TTL_VAL		0xffff0000	/* Packet Time-to-live value */
+#define RIO_EM_PW_TGT32_DEVID	0x030	/* Port-write Dev32 Target deviceID CSR */
+#define RIO_EM_PW_TX_CTRL	0x034	/* Port-write Transmission Control CSR */
+#define RIO_EM_PW_TX_CTRL_PW_DIS	0x00000001	/* Port-write Transmission Disable bit */
 
 /* Per-Port EM Registers */
 
 #define RIO_EM_PN_ERR_DETECT(x)	(0x040 + x*0x40) /* Port N Error Detect CSR */
 #define  REM_PED_IMPL_SPEC		0x80000000
+#define  REM_PED_LINK_OK2U		0x40000000 /* Link OK to Uninit transition */
+#define  REM_PED_LINK_UPDA		0x20000000 /* Link Uninit Packet Discard Active */
+#define  REM_PED_LINK_U2OK		0x10000000 /* Link Uninit to OK transition */
 #define  REM_PED_LINK_TO		0x00000001
+
 #define RIO_EM_PN_ERRRATE_EN(x) (0x044 + x*0x40) /* Port N Error Rate Enable CSR */
+#define RIO_EM_PN_ERRRATE_EN_OK2U	0x40000000 /* Enable notification for OK2U */
+#define RIO_EM_PN_ERRRATE_EN_UPDA	0x20000000 /* Enable notification for UPDA */
+#define RIO_EM_PN_ERRRATE_EN_U2OK	0x10000000 /* Enable notification for U2OK */
+
 #define RIO_EM_PN_ATTRIB_CAP(x)	(0x048 + x*0x40) /* Port N Attributes Capture CSR */
 #define RIO_EM_PN_PKT_CAP_0(x)	(0x04c + x*0x40) /* Port N Packet/Control Symbol Capture 0 CSR */
 #define RIO_EM_PN_PKT_CAP_1(x)	(0x050 + x*0x40) /* Port N Packet Capture 1 CSR */
@@ -294,5 +346,50 @@
 #define RIO_EM_PN_PKT_CAP_3(x)	(0x058 + x*0x40) /* Port N Packet Capture 3 CSR */
 #define RIO_EM_PN_ERRRATE(x)	(0x068 + x*0x40) /* Port N Error Rate CSR */
 #define RIO_EM_PN_ERRRATE_TR(x) (0x06c + x*0x40) /* Port N Error Rate Threshold CSR */
+#define RIO_EM_PN_LINK_UDT(x)	(0x070 + x*0x40) /* Port N Link Uninit Discard Timer CSR */
+#define RIO_EM_PN_LINK_UDT_TO		0xffffff00 /* Link Uninit Timeout value */
+
+/*
+ * Switch Routing Table Register Block ID=0x000E (RapidIO 3.0+, part 3)
+ * Register offsets are defined from beginning of the block.
+ */
+
+/* Broadcast Routing Table Control CSR */
+#define RIO_BC_RT_CTL_CSR	0x020
+#define  RIO_RT_CTL_THREE_LVL		0x80000000
+#define  RIO_RT_CTL_DEV32_RT_CTRL	0x40000000
+#define  RIO_RT_CTL_MC_MASK_SZ		0x03000000 /* 3.0+ Part 11: Multicast */
+
+/* Broadcast Level 0 Info CSR */
+#define RIO_BC_RT_LVL0_INFO_CSR	0x030
+#define  RIO_RT_L0I_NUM_GR		0xff000000
+#define  RIO_RT_L0I_GR_PTR		0x00fffc00
+
+/* Broadcast Level 1 Info CSR */
+#define RIO_BC_RT_LVL1_INFO_CSR	0x034
+#define  RIO_RT_L1I_NUM_GR		0xff000000
+#define  RIO_RT_L1I_GR_PTR		0x00fffc00
+
+/* Broadcast Level 2 Info CSR */
+#define RIO_BC_RT_LVL2_INFO_CSR	0x038
+#define  RIO_RT_L2I_NUM_GR		0xff000000
+#define  RIO_RT_L2I_GR_PTR		0x00fffc00
+
+/* Per-Port Routing Table registers.
+ * Register fields defined in the broadcast section above are
+ * applicable to the corresponding registers below.
+ */
+#define RIO_SPx_RT_CTL_CSR(x)	(0x040 + (0x20 * x))
+#define RIO_SPx_RT_LVL0_INFO_CSR(x)	(0x50 + (0x20 * x))
+#define RIO_SPx_RT_LVL1_INFO_CSR(x)	(0x54 + (0x20 * x))
+#define RIO_SPx_RT_LVL2_INFO_CSR(x)	(0x58 + (0x20 * x))
+
+/* Register Formats for Routing Table Group entry.
+ * Register offsets are calculated using GR_PTR field in the corresponding
+ * table Level_N and group/entry numbers (see RapidIO 3.0+ Part 3).
+ */
+#define RIO_RT_Ln_ENTRY_IMPL_DEF	0xf0000000
+#define RIO_RT_Ln_ENTRY_RTE_VAL		0x000003ff
+#define RIO_RT_ENTRY_DROP_PKT		0x300
 
 #endif				/* LINUX_RIO_REGS_H */

From adff1649e6d66d9dda7631701eb98e8482edaff6 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:07:00 -0700
Subject: [PATCH 100/111] powerpc/fsl_rio: apply changes for RIO spec rev 3

 - Remove check for parallel PHY

 - Set LP-Serial Register Map type

[akpm@linux-foundation.org: fix build]
[alexandre.bounine@idt.com: fix build fix]
 Link: http://lkml.kernel.org/r/20160802184932.2755-1-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-13-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/powerpc/sysdev/fsl_rio.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_rio.c b/arch/powerpc/sysdev/fsl_rio.c
index 386790cfa16e..984e816f3faf 100644
--- a/arch/powerpc/sysdev/fsl_rio.c
+++ b/arch/powerpc/sysdev/fsl_rio.c
@@ -643,19 +643,11 @@ int fsl_rio_setup(struct platform_device *dev)
 		port->ops = ops;
 		port->priv = priv;
 		port->phys_efptr = 0x100;
+		port->phys_rmap = 1;
 		priv->regs_win = rio_regs_win;
 
-		/* Probe the master port phy type */
 		ccsr = in_be32(priv->regs_win + RIO_CCSR + i*0x20);
-		port->phy_type = (ccsr & 1) ? RIO_PHY_SERIAL : RIO_PHY_PARALLEL;
-		if (port->phy_type == RIO_PHY_PARALLEL) {
-			dev_err(&dev->dev, "RIO: Parallel PHY type, unsupported port type!\n");
-			release_resource(&port->iores);
-			kfree(priv);
-			kfree(port);
-			continue;
-		}
-		dev_info(&dev->dev, "RapidIO PHY type: Serial\n");
+
 		/* Checking the port training status */
 		if (in_be32((priv->regs_win + RIO_ESCSR + i*0x20)) & 1) {
 			dev_err(&dev->dev, "Port %d is not ready. "
@@ -705,11 +697,9 @@ int fsl_rio_setup(struct platform_device *dev)
 			((i == 0) ? RIO_INB_ATMU_REGS_PORT1_OFFSET :
 			RIO_INB_ATMU_REGS_PORT2_OFFSET));
 
-
-		/* Set to receive any dist ID for serial RapidIO controller. */
-		if (port->phy_type == RIO_PHY_SERIAL)
-			out_be32((priv->regs_win
-				+ RIO_ISR_AACR + i*0x80), RIO_ISR_AACR_AA);
+		/* Set to receive packets with any dest ID */
+		out_be32((priv->regs_win + RIO_ISR_AACR + i*0x80),
+			 RIO_ISR_AACR_AA);
 
 		/* Configure maintenance transaction window */
 		out_be32(&priv->maint_atmu_regs->rowbar,

From 0b9364b5cf11c6e504f4b77e24b15a0dc8a82df0 Mon Sep 17 00:00:00 2001
From: Alexandre Bounine <alexandre.bounine@idt.com>
Date: Tue, 2 Aug 2016 14:07:03 -0700
Subject: [PATCH 101/111] rapidio/switches: add driver for IDT gen3 switches

Add RapidIO switch driver for IDT Gen3 switch devices: RXS1632 and
RXS2448.

[alexandre.bounine@idt.com: fixup for original driver patch]
  Link: http://lkml.kernel.org/r/1469137596-18241-1-git-send-email-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-14-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/rapidio/switches/Kconfig    |   6 +
 drivers/rapidio/switches/Makefile   |   1 +
 drivers/rapidio/switches/idt_gen3.c | 382 ++++++++++++++++++++++++++++
 include/linux/rio_ids.h             |   2 +
 4 files changed, 391 insertions(+)
 create mode 100644 drivers/rapidio/switches/idt_gen3.c

diff --git a/drivers/rapidio/switches/Kconfig b/drivers/rapidio/switches/Kconfig
index 345841562f95..92767fd3b541 100644
--- a/drivers/rapidio/switches/Kconfig
+++ b/drivers/rapidio/switches/Kconfig
@@ -22,3 +22,9 @@ config RAPIDIO_CPS_GEN2
 	default n
 	---help---
 	  Includes support for ITD CPS Gen.2 serial RapidIO switches.
+
+config RAPIDIO_RXS_GEN3
+	tristate "IDT RXS Gen.3 SRIO switch support"
+	default n
+	---help---
+	  Includes support for ITD RXS Gen.3 serial RapidIO switches.
diff --git a/drivers/rapidio/switches/Makefile b/drivers/rapidio/switches/Makefile
index 051cc6b38188..6bdd54c4e733 100644
--- a/drivers/rapidio/switches/Makefile
+++ b/drivers/rapidio/switches/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_RAPIDIO_TSI57X)	+= tsi57x.o
 obj-$(CONFIG_RAPIDIO_CPS_XX)	+= idtcps.o
 obj-$(CONFIG_RAPIDIO_TSI568)	+= tsi568.o
 obj-$(CONFIG_RAPIDIO_CPS_GEN2)	+= idt_gen2.o
+obj-$(CONFIG_RAPIDIO_RXS_GEN3)	+= idt_gen3.o
diff --git a/drivers/rapidio/switches/idt_gen3.c b/drivers/rapidio/switches/idt_gen3.c
new file mode 100644
index 000000000000..c5923a547bed
--- /dev/null
+++ b/drivers/rapidio/switches/idt_gen3.c
@@ -0,0 +1,382 @@
+/*
+ * IDT RXS Gen.3 Serial RapidIO switch family support
+ *
+ * Copyright 2016 Integrated Device Technology, Inc.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/stat.h>
+#include <linux/module.h>
+#include <linux/rio.h>
+#include <linux/rio_drv.h>
+#include <linux/rio_ids.h>
+#include <linux/delay.h>
+
+#include <asm/page.h>
+#include "../rio.h"
+
+#define RIO_EM_PW_STAT		0x40020
+#define RIO_PW_CTL		0x40204
+#define RIO_PW_CTL_PW_TMR		0xffffff00
+#define RIO_PW_ROUTE		0x40208
+
+#define RIO_EM_DEV_INT_EN	0x40030
+
+#define RIO_PLM_SPx_IMP_SPEC_CTL(x)	(0x10100 + (x)*0x100)
+#define RIO_PLM_SPx_IMP_SPEC_CTL_SOFT_RST	0x02000000
+
+#define RIO_PLM_SPx_PW_EN(x)	(0x10118 + (x)*0x100)
+#define RIO_PLM_SPx_PW_EN_OK2U	0x40000000
+#define RIO_PLM_SPx_PW_EN_LINIT 0x10000000
+
+#define RIO_BC_L2_Gn_ENTRYx_CSR(n, x)	(0x31000 + (n)*0x400 + (x)*0x4)
+#define RIO_SPx_L2_Gn_ENTRYy_CSR(x, n, y) \
+				(0x51000 + (x)*0x2000 + (n)*0x400 + (y)*0x4)
+
+static int
+idtg3_route_add_entry(struct rio_mport *mport, u16 destid, u8 hopcount,
+		       u16 table, u16 route_destid, u8 route_port)
+{
+	u32 rval;
+	u32 entry = route_port;
+	int err = 0;
+
+	pr_debug("RIO: %s t=0x%x did_%x to p_%x\n",
+		 __func__, table, route_destid, entry);
+
+	if (route_destid > 0xFF)
+		return -EINVAL;
+
+	if (route_port == RIO_INVALID_ROUTE)
+		entry = RIO_RT_ENTRY_DROP_PKT;
+
+	if (table == RIO_GLOBAL_TABLE) {
+		/* Use broadcast register to update all per-port tables */
+		err = rio_mport_write_config_32(mport, destid, hopcount,
+				RIO_BC_L2_Gn_ENTRYx_CSR(0, route_destid),
+				entry);
+		return err;
+	}
+
+	/*
+	 * Verify that specified port/table number is valid
+	 */
+	err = rio_mport_read_config_32(mport, destid, hopcount,
+				       RIO_SWP_INFO_CAR, &rval);
+	if (err)
+		return err;
+
+	if (table >= RIO_GET_TOTAL_PORTS(rval))
+		return -EINVAL;
+
+	err = rio_mport_write_config_32(mport, destid, hopcount,
+			RIO_SPx_L2_Gn_ENTRYy_CSR(table, 0, route_destid),
+			entry);
+	return err;
+}
+
+static int
+idtg3_route_get_entry(struct rio_mport *mport, u16 destid, u8 hopcount,
+		       u16 table, u16 route_destid, u8 *route_port)
+{
+	u32 rval;
+	int err;
+
+	if (route_destid > 0xFF)
+		return -EINVAL;
+
+	err = rio_mport_read_config_32(mport, destid, hopcount,
+				       RIO_SWP_INFO_CAR, &rval);
+	if (err)
+		return err;
+
+	/*
+	 * This switch device does not have the dedicated global routing table.
+	 * It is substituted by reading routing table of the ingress port of
+	 * maintenance read requests.
+	 */
+	if (table == RIO_GLOBAL_TABLE)
+		table = RIO_GET_PORT_NUM(rval);
+	else if (table >= RIO_GET_TOTAL_PORTS(rval))
+		return -EINVAL;
+
+	err = rio_mport_read_config_32(mport, destid, hopcount,
+			RIO_SPx_L2_Gn_ENTRYy_CSR(table, 0, route_destid),
+			&rval);
+	if (err)
+		return err;
+
+	if (rval == RIO_RT_ENTRY_DROP_PKT)
+		*route_port = RIO_INVALID_ROUTE;
+	else
+		*route_port = (u8)rval;
+
+	return 0;
+}
+
+static int
+idtg3_route_clr_table(struct rio_mport *mport, u16 destid, u8 hopcount,
+		       u16 table)
+{
+	u32 i;
+	u32 rval;
+	int err;
+
+	if (table == RIO_GLOBAL_TABLE) {
+		for (i = 0; i <= 0xff; i++) {
+			err = rio_mport_write_config_32(mport, destid, hopcount,
+						RIO_BC_L2_Gn_ENTRYx_CSR(0, i),
+						RIO_RT_ENTRY_DROP_PKT);
+			if (err)
+				break;
+		}
+
+		return err;
+	}
+
+	err = rio_mport_read_config_32(mport, destid, hopcount,
+				       RIO_SWP_INFO_CAR, &rval);
+	if (err)
+		return err;
+
+	if (table >= RIO_GET_TOTAL_PORTS(rval))
+		return -EINVAL;
+
+	for (i = 0; i <= 0xff; i++) {
+		err = rio_mport_write_config_32(mport, destid, hopcount,
+					RIO_SPx_L2_Gn_ENTRYy_CSR(table, 0, i),
+					RIO_RT_ENTRY_DROP_PKT);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+/*
+ * This routine performs device-specific initialization only.
+ * All standard EM configuration should be performed at upper level.
+ */
+static int
+idtg3_em_init(struct rio_dev *rdev)
+{
+	int i, tmp;
+	u32 rval;
+
+	pr_debug("RIO: %s [%d:%d]\n", __func__, rdev->destid, rdev->hopcount);
+
+	/* Disable assertion of interrupt signal */
+	rio_write_config_32(rdev, RIO_EM_DEV_INT_EN, 0);
+
+	/* Disable port-write event notifications during initialization */
+	rio_write_config_32(rdev, rdev->em_efptr + RIO_EM_PW_TX_CTRL,
+			    RIO_EM_PW_TX_CTRL_PW_DIS);
+
+	/* Configure Port-Write notifications for hot-swap events */
+	tmp = RIO_GET_TOTAL_PORTS(rdev->swpinfo);
+	for (i = 0; i < tmp; i++) {
+
+		rio_read_config_32(rdev,
+			RIO_DEV_PORT_N_ERR_STS_CSR(rdev, i),
+			&rval);
+		if (rval & RIO_PORT_N_ERR_STS_PORT_UA)
+			continue;
+
+		/* Clear events signaled before enabling notification */
+		rio_write_config_32(rdev,
+			rdev->em_efptr + RIO_EM_PN_ERR_DETECT(i), 0);
+
+		/* Enable event notifications */
+		rio_write_config_32(rdev,
+			rdev->em_efptr + RIO_EM_PN_ERRRATE_EN(i),
+			RIO_EM_PN_ERRRATE_EN_OK2U | RIO_EM_PN_ERRRATE_EN_U2OK);
+		/* Enable port-write generation on events */
+		rio_write_config_32(rdev, RIO_PLM_SPx_PW_EN(i),
+			RIO_PLM_SPx_PW_EN_OK2U | RIO_PLM_SPx_PW_EN_LINIT);
+
+	}
+
+	/* Set Port-Write destination port */
+	tmp = RIO_GET_PORT_NUM(rdev->swpinfo);
+	rio_write_config_32(rdev, RIO_PW_ROUTE, 1 << tmp);
+
+
+	/* Enable sending port-write event notifications */
+	rio_write_config_32(rdev, rdev->em_efptr + RIO_EM_PW_TX_CTRL, 0);
+
+	/* set TVAL = ~50us */
+	rio_write_config_32(rdev,
+		rdev->phys_efptr + RIO_PORT_LINKTO_CTL_CSR, 0x8e << 8);
+	return 0;
+}
+
+
+/*
+ * idtg3_em_handler - device-specific error handler
+ *
+ * If the link is down (PORT_UNINIT) does nothing - this is considered
+ * as link partner removal from the port.
+ *
+ * If the link is up (PORT_OK) - situation is handled as *new* device insertion.
+ * In this case ERR_STOP bits are cleared by issuing soft reset command to the
+ * reporting port. Inbound and outbound ackIDs are cleared by the reset as well.
+ * This way the port is synchronized with freshly inserted device (assuming it
+ * was reset/powered-up on insertion).
+ *
+ * TODO: This is not sufficient in a situation when a link between two devices
+ * was down and up again (e.g. cable disconnect). For that situation full ackID
+ * realignment process has to be implemented.
+ */
+static int
+idtg3_em_handler(struct rio_dev *rdev, u8 pnum)
+{
+	u32 err_status;
+	u32 rval;
+
+	rio_read_config_32(rdev,
+			RIO_DEV_PORT_N_ERR_STS_CSR(rdev, pnum),
+			&err_status);
+
+	/* Do nothing for device/link removal */
+	if (err_status & RIO_PORT_N_ERR_STS_PORT_UNINIT)
+		return 0;
+
+	/* When link is OK we have a device insertion.
+	 * Request port soft reset to clear errors if they present.
+	 * Inbound and outbound ackIDs will be 0 after reset.
+	 */
+	if (err_status & (RIO_PORT_N_ERR_STS_OUT_ES |
+				RIO_PORT_N_ERR_STS_INP_ES)) {
+		rio_read_config_32(rdev, RIO_PLM_SPx_IMP_SPEC_CTL(pnum), &rval);
+		rio_write_config_32(rdev, RIO_PLM_SPx_IMP_SPEC_CTL(pnum),
+				    rval | RIO_PLM_SPx_IMP_SPEC_CTL_SOFT_RST);
+		udelay(10);
+		rio_write_config_32(rdev, RIO_PLM_SPx_IMP_SPEC_CTL(pnum), rval);
+		msleep(500);
+	}
+
+	return 0;
+}
+
+static struct rio_switch_ops idtg3_switch_ops = {
+	.owner = THIS_MODULE,
+	.add_entry = idtg3_route_add_entry,
+	.get_entry = idtg3_route_get_entry,
+	.clr_table = idtg3_route_clr_table,
+	.em_init   = idtg3_em_init,
+	.em_handle = idtg3_em_handler,
+};
+
+static int idtg3_probe(struct rio_dev *rdev, const struct rio_device_id *id)
+{
+	pr_debug("RIO: %s for %s\n", __func__, rio_name(rdev));
+
+	spin_lock(&rdev->rswitch->lock);
+
+	if (rdev->rswitch->ops) {
+		spin_unlock(&rdev->rswitch->lock);
+		return -EINVAL;
+	}
+
+	rdev->rswitch->ops = &idtg3_switch_ops;
+
+	if (rdev->do_enum) {
+		/* Disable hierarchical routing support: Existing fabric
+		 * enumeration/discovery process (see rio-scan.c) uses 8-bit
+		 * flat destination ID routing only.
+		 */
+		rio_write_config_32(rdev, 0x5000 + RIO_BC_RT_CTL_CSR, 0);
+	}
+
+	spin_unlock(&rdev->rswitch->lock);
+
+	return 0;
+}
+
+static void idtg3_remove(struct rio_dev *rdev)
+{
+	pr_debug("RIO: %s for %s\n", __func__, rio_name(rdev));
+	spin_lock(&rdev->rswitch->lock);
+	if (rdev->rswitch->ops == &idtg3_switch_ops)
+		rdev->rswitch->ops = NULL;
+	spin_unlock(&rdev->rswitch->lock);
+}
+
+/*
+ * Gen3 switches repeat sending PW messages until a corresponding event flag
+ * is cleared. Use shutdown notification to disable generation of port-write
+ * messages if their destination node is shut down.
+ */
+static void idtg3_shutdown(struct rio_dev *rdev)
+{
+	int i;
+	u32 rval;
+	u16 destid;
+
+	/* Currently the enumerator node acts also as PW handler */
+	if (!rdev->do_enum)
+		return;
+
+	pr_debug("RIO: %s(%s)\n", __func__, rio_name(rdev));
+
+	rio_read_config_32(rdev, RIO_PW_ROUTE, &rval);
+	i = RIO_GET_PORT_NUM(rdev->swpinfo);
+
+	/* Check port-write destination port */
+	if (!((1 << i) & rval))
+		return;
+
+	/* Disable sending port-write event notifications if PW destID
+	 * matches to one of the enumerator node
+	 */
+	rio_read_config_32(rdev, rdev->em_efptr + RIO_EM_PW_TGT_DEVID, &rval);
+
+	if (rval & RIO_EM_PW_TGT_DEVID_DEV16)
+		destid = rval >> 16;
+	else
+		destid = ((rval & RIO_EM_PW_TGT_DEVID_D8) >> 16);
+
+	if (rdev->net->hport->host_deviceid == destid) {
+		rio_write_config_32(rdev,
+				    rdev->em_efptr + RIO_EM_PW_TX_CTRL, 0);
+		pr_debug("RIO: %s(%s) PW transmission disabled\n",
+			 __func__, rio_name(rdev));
+	}
+}
+
+static struct rio_device_id idtg3_id_table[] = {
+	{RIO_DEVICE(RIO_DID_IDTRXS1632, RIO_VID_IDT)},
+	{RIO_DEVICE(RIO_DID_IDTRXS2448, RIO_VID_IDT)},
+	{ 0, }	/* terminate list */
+};
+
+static struct rio_driver idtg3_driver = {
+	.name = "idt_gen3",
+	.id_table = idtg3_id_table,
+	.probe = idtg3_probe,
+	.remove = idtg3_remove,
+	.shutdown = idtg3_shutdown,
+};
+
+static int __init idtg3_init(void)
+{
+	return rio_register_driver(&idtg3_driver);
+}
+
+static void __exit idtg3_exit(void)
+{
+	pr_debug("RIO: %s\n", __func__);
+	rio_unregister_driver(&idtg3_driver);
+	pr_debug("RIO: %s done\n", __func__);
+}
+
+device_initcall(idtg3_init);
+module_exit(idtg3_exit);
+
+MODULE_DESCRIPTION("IDT RXS Gen.3 Serial RapidIO switch family driver");
+MODULE_AUTHOR("Integrated Device Technology, Inc.");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/rio_ids.h b/include/linux/rio_ids.h
index 2543bc163d54..334c576c151c 100644
--- a/include/linux/rio_ids.h
+++ b/include/linux/rio_ids.h
@@ -38,5 +38,7 @@
 #define RIO_DID_IDTVPS1616		0x0377
 #define RIO_DID_IDTSPS1616		0x0378
 #define RIO_DID_TSI721			0x80ab
+#define RIO_DID_IDTRXS1632		0x80e5
+#define RIO_DID_IDTRXS2448		0x80e6
 
 #endif				/* LINUX_RIO_IDS_H */

From 098f9fb0c962eb2fdba5f9d34f4cf7a938237184 Mon Sep 17 00:00:00 2001
From: "Andrew F. Davis" <afd@ti.com>
Date: Tue, 2 Aug 2016 14:07:06 -0700
Subject: [PATCH 102/111] w1: remove need for ida and use PLATFORM_DEVID_AUTO

PLATFORM_DEVID_AUTO can be used to have the platform core assign a
unique ID instead of manually creating one with IDA.  Do this in all
applicable drivers.

Link: http://lkml.kernel.org/r/20160531204313.20979-1-afd@ti.com
Signed-off-by: Andrew F. Davis <afd@ti.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/w1/slaves/w1_ds2760.c | 28 +++++-----------------------
 drivers/w1/slaves/w1_ds2780.c | 25 ++++---------------------
 drivers/w1/slaves/w1_ds2781.c | 26 ++++----------------------
 3 files changed, 13 insertions(+), 66 deletions(-)

diff --git a/drivers/w1/slaves/w1_ds2760.c b/drivers/w1/slaves/w1_ds2760.c
index d9079d48d112..59a81cfe64d5 100644
--- a/drivers/w1/slaves/w1_ds2760.c
+++ b/drivers/w1/slaves/w1_ds2760.c
@@ -121,25 +121,14 @@ static const struct attribute_group *w1_ds2760_groups[] = {
 	NULL,
 };
 
-static DEFINE_IDA(bat_ida);
-
 static int w1_ds2760_add_slave(struct w1_slave *sl)
 {
 	int ret;
-	int id;
 	struct platform_device *pdev;
 
-	id = ida_simple_get(&bat_ida, 0, 0, GFP_KERNEL);
-	if (id < 0) {
-		ret = id;
-		goto noid;
-	}
-
-	pdev = platform_device_alloc("ds2760-battery", id);
-	if (!pdev) {
-		ret = -ENOMEM;
-		goto pdev_alloc_failed;
-	}
+	pdev = platform_device_alloc("ds2760-battery", PLATFORM_DEVID_AUTO);
+	if (!pdev)
+		return -ENOMEM;
 	pdev->dev.parent = &sl->dev;
 
 	ret = platform_device_add(pdev);
@@ -148,24 +137,19 @@ static int w1_ds2760_add_slave(struct w1_slave *sl)
 
 	dev_set_drvdata(&sl->dev, pdev);
 
-	goto success;
+	return 0;
 
 pdev_add_failed:
 	platform_device_put(pdev);
-pdev_alloc_failed:
-	ida_simple_remove(&bat_ida, id);
-noid:
-success:
+
 	return ret;
 }
 
 static void w1_ds2760_remove_slave(struct w1_slave *sl)
 {
 	struct platform_device *pdev = dev_get_drvdata(&sl->dev);
-	int id = pdev->id;
 
 	platform_device_unregister(pdev);
-	ida_simple_remove(&bat_ida, id);
 }
 
 static struct w1_family_ops w1_ds2760_fops = {
@@ -182,14 +166,12 @@ static struct w1_family w1_ds2760_family = {
 static int __init w1_ds2760_init(void)
 {
 	pr_info("1-Wire driver for the DS2760 battery monitor chip - (c) 2004-2005, Szabolcs Gyurko\n");
-	ida_init(&bat_ida);
 	return w1_register_family(&w1_ds2760_family);
 }
 
 static void __exit w1_ds2760_exit(void)
 {
 	w1_unregister_family(&w1_ds2760_family);
-	ida_destroy(&bat_ida);
 }
 
 EXPORT_SYMBOL(w1_ds2760_read);
diff --git a/drivers/w1/slaves/w1_ds2780.c b/drivers/w1/slaves/w1_ds2780.c
index 50e85f7929d4..e63eb86d66f1 100644
--- a/drivers/w1/slaves/w1_ds2780.c
+++ b/drivers/w1/slaves/w1_ds2780.c
@@ -113,25 +113,14 @@ static const struct attribute_group *w1_ds2780_groups[] = {
 	NULL,
 };
 
-static DEFINE_IDA(bat_ida);
-
 static int w1_ds2780_add_slave(struct w1_slave *sl)
 {
 	int ret;
-	int id;
 	struct platform_device *pdev;
 
-	id = ida_simple_get(&bat_ida, 0, 0, GFP_KERNEL);
-	if (id < 0) {
-		ret = id;
-		goto noid;
-	}
-
-	pdev = platform_device_alloc("ds2780-battery", id);
-	if (!pdev) {
-		ret = -ENOMEM;
-		goto pdev_alloc_failed;
-	}
+	pdev = platform_device_alloc("ds2780-battery", PLATFORM_DEVID_AUTO);
+	if (!pdev)
+		return -ENOMEM;
 	pdev->dev.parent = &sl->dev;
 
 	ret = platform_device_add(pdev);
@@ -144,19 +133,15 @@ static int w1_ds2780_add_slave(struct w1_slave *sl)
 
 pdev_add_failed:
 	platform_device_put(pdev);
-pdev_alloc_failed:
-	ida_simple_remove(&bat_ida, id);
-noid:
+
 	return ret;
 }
 
 static void w1_ds2780_remove_slave(struct w1_slave *sl)
 {
 	struct platform_device *pdev = dev_get_drvdata(&sl->dev);
-	int id = pdev->id;
 
 	platform_device_unregister(pdev);
-	ida_simple_remove(&bat_ida, id);
 }
 
 static struct w1_family_ops w1_ds2780_fops = {
@@ -172,14 +157,12 @@ static struct w1_family w1_ds2780_family = {
 
 static int __init w1_ds2780_init(void)
 {
-	ida_init(&bat_ida);
 	return w1_register_family(&w1_ds2780_family);
 }
 
 static void __exit w1_ds2780_exit(void)
 {
 	w1_unregister_family(&w1_ds2780_family);
-	ida_destroy(&bat_ida);
 }
 
 module_init(w1_ds2780_init);
diff --git a/drivers/w1/slaves/w1_ds2781.c b/drivers/w1/slaves/w1_ds2781.c
index 1eb98fb1688d..99b0f4dc0e31 100644
--- a/drivers/w1/slaves/w1_ds2781.c
+++ b/drivers/w1/slaves/w1_ds2781.c
@@ -17,7 +17,6 @@
 #include <linux/types.h>
 #include <linux/platform_device.h>
 #include <linux/mutex.h>
-#include <linux/idr.h>
 
 #include "../w1.h"
 #include "../w1_int.h"
@@ -111,25 +110,14 @@ static const struct attribute_group *w1_ds2781_groups[] = {
 	NULL,
 };
 
-static DEFINE_IDA(bat_ida);
-
 static int w1_ds2781_add_slave(struct w1_slave *sl)
 {
 	int ret;
-	int id;
 	struct platform_device *pdev;
 
-	id = ida_simple_get(&bat_ida, 0, 0, GFP_KERNEL);
-	if (id < 0) {
-		ret = id;
-		goto noid;
-	}
-
-	pdev = platform_device_alloc("ds2781-battery", id);
-	if (!pdev) {
-		ret = -ENOMEM;
-		goto pdev_alloc_failed;
-	}
+	pdev = platform_device_alloc("ds2781-battery", PLATFORM_DEVID_AUTO);
+	if (!pdev)
+		return -ENOMEM;
 	pdev->dev.parent = &sl->dev;
 
 	ret = platform_device_add(pdev);
@@ -142,19 +130,15 @@ static int w1_ds2781_add_slave(struct w1_slave *sl)
 
 pdev_add_failed:
 	platform_device_put(pdev);
-pdev_alloc_failed:
-	ida_simple_remove(&bat_ida, id);
-noid:
+
 	return ret;
 }
 
 static void w1_ds2781_remove_slave(struct w1_slave *sl)
 {
 	struct platform_device *pdev = dev_get_drvdata(&sl->dev);
-	int id = pdev->id;
 
 	platform_device_unregister(pdev);
-	ida_simple_remove(&bat_ida, id);
 }
 
 static struct w1_family_ops w1_ds2781_fops = {
@@ -170,14 +154,12 @@ static struct w1_family w1_ds2781_family = {
 
 static int __init w1_ds2781_init(void)
 {
-	ida_init(&bat_ida);
 	return w1_register_family(&w1_ds2781_family);
 }
 
 static void __exit w1_ds2781_exit(void)
 {
 	w1_unregister_family(&w1_ds2781_family);
-	ida_destroy(&bat_ida);
 }
 
 module_init(w1_ds2781_init);

From 939fc832290d548a02b6a309992b3c1ff7de1ff9 Mon Sep 17 00:00:00 2001
From: "Andrew F. Davis" <afd@ti.com>
Date: Tue, 2 Aug 2016 14:07:09 -0700
Subject: [PATCH 103/111] w1: add helper macro module_w1_family

The helper macro module_w1_family can be used in module drivers that
only register a w1 driver in their module init functions.  Add this
macro and use it in all applicable drivers.

Link: http://lkml.kernel.org/r/20160531204313.20979-2-afd@ti.com
Signed-off-by: Andrew F. Davis <afd@ti.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/w1/slaves/w1_ds2406.c  | 14 +-------------
 drivers/w1/slaves/w1_ds2408.c  | 14 +-------------
 drivers/w1/slaves/w1_ds2413.c  | 14 +-------------
 drivers/w1/slaves/w1_ds2423.c  | 14 +-------------
 drivers/w1/slaves/w1_ds2431.c  | 14 +-------------
 drivers/w1/slaves/w1_ds2433.c  | 14 +-------------
 drivers/w1/slaves/w1_ds2760.c  | 15 +--------------
 drivers/w1/slaves/w1_ds2780.c  | 14 +-------------
 drivers/w1/slaves/w1_ds2781.c  | 14 +-------------
 drivers/w1/slaves/w1_ds28e04.c | 14 +-------------
 drivers/w1/w1_family.h         | 12 ++++++++++++
 11 files changed, 22 insertions(+), 131 deletions(-)

diff --git a/drivers/w1/slaves/w1_ds2406.c b/drivers/w1/slaves/w1_ds2406.c
index d488961a8c90..51f2f66d6555 100644
--- a/drivers/w1/slaves/w1_ds2406.c
+++ b/drivers/w1/slaves/w1_ds2406.c
@@ -153,16 +153,4 @@ static struct w1_family w1_family_12 = {
 	.fid = W1_FAMILY_DS2406,
 	.fops = &w1_f12_fops,
 };
-
-static int __init w1_f12_init(void)
-{
-	return w1_register_family(&w1_family_12);
-}
-
-static void __exit w1_f12_exit(void)
-{
-	w1_unregister_family(&w1_family_12);
-}
-
-module_init(w1_f12_init);
-module_exit(w1_f12_exit);
+module_w1_family(w1_family_12);
diff --git a/drivers/w1/slaves/w1_ds2408.c b/drivers/w1/slaves/w1_ds2408.c
index 7dfa0e11688a..aec5958e66e9 100644
--- a/drivers/w1/slaves/w1_ds2408.c
+++ b/drivers/w1/slaves/w1_ds2408.c
@@ -351,16 +351,4 @@ static struct w1_family w1_family_29 = {
 	.fid = W1_FAMILY_DS2408,
 	.fops = &w1_f29_fops,
 };
-
-static int __init w1_f29_init(void)
-{
-	return w1_register_family(&w1_family_29);
-}
-
-static void __exit w1_f29_exit(void)
-{
-	w1_unregister_family(&w1_family_29);
-}
-
-module_init(w1_f29_init);
-module_exit(w1_f29_exit);
+module_w1_family(w1_family_29);
diff --git a/drivers/w1/slaves/w1_ds2413.c b/drivers/w1/slaves/w1_ds2413.c
index ee28fc1ff390..f2e1c51533b9 100644
--- a/drivers/w1/slaves/w1_ds2413.c
+++ b/drivers/w1/slaves/w1_ds2413.c
@@ -135,16 +135,4 @@ static struct w1_family w1_family_3a = {
 	.fid = W1_FAMILY_DS2413,
 	.fops = &w1_f3a_fops,
 };
-
-static int __init w1_f3a_init(void)
-{
-	return w1_register_family(&w1_family_3a);
-}
-
-static void __exit w1_f3a_exit(void)
-{
-	w1_unregister_family(&w1_family_3a);
-}
-
-module_init(w1_f3a_init);
-module_exit(w1_f3a_exit);
+module_w1_family(w1_family_3a);
diff --git a/drivers/w1/slaves/w1_ds2423.c b/drivers/w1/slaves/w1_ds2423.c
index 7e41b7d91fb5..4ab54fd9dde2 100644
--- a/drivers/w1/slaves/w1_ds2423.c
+++ b/drivers/w1/slaves/w1_ds2423.c
@@ -138,19 +138,7 @@ static struct w1_family w1_family_1d = {
 	.fid = W1_COUNTER_DS2423,
 	.fops = &w1_f1d_fops,
 };
-
-static int __init w1_f1d_init(void)
-{
-	return w1_register_family(&w1_family_1d);
-}
-
-static void __exit w1_f1d_exit(void)
-{
-	w1_unregister_family(&w1_family_1d);
-}
-
-module_init(w1_f1d_init);
-module_exit(w1_f1d_exit);
+module_w1_family(w1_family_1d);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Mika Laitio <lamikr@pilppa.org>");
diff --git a/drivers/w1/slaves/w1_ds2431.c b/drivers/w1/slaves/w1_ds2431.c
index 9c4ff9d28adc..80572cb63ba8 100644
--- a/drivers/w1/slaves/w1_ds2431.c
+++ b/drivers/w1/slaves/w1_ds2431.c
@@ -288,19 +288,7 @@ static struct w1_family w1_family_2d = {
 	.fid = W1_EEPROM_DS2431,
 	.fops = &w1_f2d_fops,
 };
-
-static int __init w1_f2d_init(void)
-{
-	return w1_register_family(&w1_family_2d);
-}
-
-static void __exit w1_f2d_fini(void)
-{
-	w1_unregister_family(&w1_family_2d);
-}
-
-module_init(w1_f2d_init);
-module_exit(w1_f2d_fini);
+module_w1_family(w1_family_2d);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Bernhard Weirich <bernhard.weirich@riedel.net>");
diff --git a/drivers/w1/slaves/w1_ds2433.c b/drivers/w1/slaves/w1_ds2433.c
index 72319a968a9e..6cf378c89ecb 100644
--- a/drivers/w1/slaves/w1_ds2433.c
+++ b/drivers/w1/slaves/w1_ds2433.c
@@ -305,16 +305,4 @@ static struct w1_family w1_family_23 = {
 	.fid = W1_EEPROM_DS2433,
 	.fops = &w1_f23_fops,
 };
-
-static int __init w1_f23_init(void)
-{
-	return w1_register_family(&w1_family_23);
-}
-
-static void __exit w1_f23_fini(void)
-{
-	w1_unregister_family(&w1_family_23);
-}
-
-module_init(w1_f23_init);
-module_exit(w1_f23_fini);
+module_w1_family(w1_family_23);
diff --git a/drivers/w1/slaves/w1_ds2760.c b/drivers/w1/slaves/w1_ds2760.c
index 59a81cfe64d5..ffa37f773b3b 100644
--- a/drivers/w1/slaves/w1_ds2760.c
+++ b/drivers/w1/slaves/w1_ds2760.c
@@ -162,26 +162,13 @@ static struct w1_family w1_ds2760_family = {
 	.fid = W1_FAMILY_DS2760,
 	.fops = &w1_ds2760_fops,
 };
-
-static int __init w1_ds2760_init(void)
-{
-	pr_info("1-Wire driver for the DS2760 battery monitor chip - (c) 2004-2005, Szabolcs Gyurko\n");
-	return w1_register_family(&w1_ds2760_family);
-}
-
-static void __exit w1_ds2760_exit(void)
-{
-	w1_unregister_family(&w1_ds2760_family);
-}
+module_w1_family(w1_ds2760_family);
 
 EXPORT_SYMBOL(w1_ds2760_read);
 EXPORT_SYMBOL(w1_ds2760_write);
 EXPORT_SYMBOL(w1_ds2760_store_eeprom);
 EXPORT_SYMBOL(w1_ds2760_recall_eeprom);
 
-module_init(w1_ds2760_init);
-module_exit(w1_ds2760_exit);
-
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Szabolcs Gyurko <szabolcs.gyurko@tlt.hu>");
 MODULE_DESCRIPTION("1-wire Driver Dallas 2760 battery monitor chip");
diff --git a/drivers/w1/slaves/w1_ds2780.c b/drivers/w1/slaves/w1_ds2780.c
index e63eb86d66f1..f5c2aa429a92 100644
--- a/drivers/w1/slaves/w1_ds2780.c
+++ b/drivers/w1/slaves/w1_ds2780.c
@@ -154,19 +154,7 @@ static struct w1_family w1_ds2780_family = {
 	.fid = W1_FAMILY_DS2780,
 	.fops = &w1_ds2780_fops,
 };
-
-static int __init w1_ds2780_init(void)
-{
-	return w1_register_family(&w1_ds2780_family);
-}
-
-static void __exit w1_ds2780_exit(void)
-{
-	w1_unregister_family(&w1_ds2780_family);
-}
-
-module_init(w1_ds2780_init);
-module_exit(w1_ds2780_exit);
+module_w1_family(w1_ds2780_family);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Clifton Barnes <cabarnes@indesign-llc.com>");
diff --git a/drivers/w1/slaves/w1_ds2781.c b/drivers/w1/slaves/w1_ds2781.c
index 99b0f4dc0e31..9c03e014cf9e 100644
--- a/drivers/w1/slaves/w1_ds2781.c
+++ b/drivers/w1/slaves/w1_ds2781.c
@@ -151,19 +151,7 @@ static struct w1_family w1_ds2781_family = {
 	.fid = W1_FAMILY_DS2781,
 	.fops = &w1_ds2781_fops,
 };
-
-static int __init w1_ds2781_init(void)
-{
-	return w1_register_family(&w1_ds2781_family);
-}
-
-static void __exit w1_ds2781_exit(void)
-{
-	w1_unregister_family(&w1_ds2781_family);
-}
-
-module_init(w1_ds2781_init);
-module_exit(w1_ds2781_exit);
+module_w1_family(w1_ds2781_family);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Renata Sayakhova <renata@oktetlabs.ru>");
diff --git a/drivers/w1/slaves/w1_ds28e04.c b/drivers/w1/slaves/w1_ds28e04.c
index 365d6dff21de..5e348d38ec5c 100644
--- a/drivers/w1/slaves/w1_ds28e04.c
+++ b/drivers/w1/slaves/w1_ds28e04.c
@@ -427,16 +427,4 @@ static struct w1_family w1_family_1C = {
 	.fid = W1_FAMILY_DS28E04,
 	.fops = &w1_f1C_fops,
 };
-
-static int __init w1_f1C_init(void)
-{
-	return w1_register_family(&w1_family_1C);
-}
-
-static void __exit w1_f1C_fini(void)
-{
-	w1_unregister_family(&w1_family_1C);
-}
-
-module_init(w1_f1C_init);
-module_exit(w1_f1C_fini);
+module_w1_family(w1_family_1C);
diff --git a/drivers/w1/w1_family.h b/drivers/w1/w1_family.h
index ed5dcb80a1f7..10a7a0767187 100644
--- a/drivers/w1/w1_family.h
+++ b/drivers/w1/w1_family.h
@@ -88,4 +88,16 @@ struct w1_family * w1_family_registered(u8);
 void w1_unregister_family(struct w1_family *);
 int w1_register_family(struct w1_family *);
 
+/**
+ * module_w1_driver() - Helper macro for registering a 1-Wire families
+ * @__w1_family: w1_family struct
+ *
+ * Helper macro for 1-Wire families which do not do anything special in module
+ * init/exit. This eliminates a lot of boilerplate. Each module may only
+ * use this macro once, and calling it replaces module_init() and module_exit()
+ */
+#define module_w1_family(__w1_family) \
+	module_driver(__w1_family, w1_register_family, \
+			w1_unregister_family)
+
 #endif /* __W1_FAMILY_H */

From ecfaf0c42fc4306b5ec4bf6be01b66f8fe9a9733 Mon Sep 17 00:00:00 2001
From: "H. Nikolaus Schaller" <hns@goldelico.com>
Date: Tue, 2 Aug 2016 14:07:12 -0700
Subject: [PATCH 104/111] w1:omap_hdq: fix regression

Commit e93762bbf681 ("w1: masters: omap_hdq: add support for 1-wire
mode") added a statement to clear the hdq_irqstatus flags in
hdq_read_byte().

If the hdq reading process is scheduled slowly or interrupts are
disabled for a while the hardware read activity might already be
finished on entry of hdq_read_byte().  And hdq_isr() already has set the
hdq_irqstatus to 0x6 (can be seen in debug mode) denoting that both, the
TXCOMPLETE and RXCOMPLETE interrupts occurred in parallel.

This means there is no need to wait and the hdq_read_byte() can just
read the byte from the hdq controller.

By resetting hdq_irqstatus to 0 the read process is forced to be always
waiting again (because the if statement always succeeds) but the
hardware will not issue another RXCOMPLETE interrupt.  This results in a
false timeout.

After such a situation the hdq bus hangs.

Link: http://lkml.kernel.org/r/b724765f87ad276a69625bc19806c8c8844c4590.1469513669.git.hns@goldelico.com
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 drivers/w1/masters/omap_hdq.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/w1/masters/omap_hdq.c b/drivers/w1/masters/omap_hdq.c
index a2eec97d5064..bb09de633939 100644
--- a/drivers/w1/masters/omap_hdq.c
+++ b/drivers/w1/masters/omap_hdq.c
@@ -390,8 +390,6 @@ static int hdq_read_byte(struct hdq_data *hdq_data, u8 *val)
 		goto out;
 	}
 
-	hdq_data->hdq_irqstatus = 0;
-
 	if (!(hdq_data->hdq_irqstatus & OMAP_HDQ_INT_STATUS_RXCOMPLETE)) {
 		hdq_reg_merge(hdq_data, OMAP_HDQ_CTRL_STATUS,
 			OMAP_HDQ_CTRL_STATUS_DIR | OMAP_HDQ_CTRL_STATUS_GO,

From 841c06d71e25a4e5fe8f7ed4ba7ba4324397f910 Mon Sep 17 00:00:00 2001
From: Prarit Bhargava <prarit@redhat.com>
Date: Tue, 2 Aug 2016 14:07:15 -0700
Subject: [PATCH 105/111] init: allow blacklisting of module_init functions

sprint_symbol_no_offset() returns the string "function_name
[module_name]" where [module_name] is not printed for built in kernel
functions.  This means that the blacklisting code will fail when
comparing module function names with the extended string.

This patch adds the functionality to block a module's module_init()
function by finding the space in the string and truncating the
comparison to that length.

Link: http://lkml.kernel.org/r/1466124387-20446-1-git-send-email-prarit@redhat.com
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 init/main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/init/main.c b/init/main.c
index e7345dcaaf05..a8a58e2794a5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -716,6 +716,12 @@ static bool __init_or_module initcall_blacklisted(initcall_t fn)
 	addr = (unsigned long) dereference_function_descriptor(fn);
 	sprint_symbol_no_offset(fn_name, addr);
 
+	/*
+	 * fn will be "function_name [module_name]" where [module_name] is not
+	 * displayed for built-in init functions.  Strip off the [module_name].
+	 */
+	strreplace(fn_name, ' ', '\0');
+
 	list_for_each_entry(entry, &blacklisted_initcalls, next) {
 		if (!strcmp(fn_name, entry->buf)) {
 			pr_debug("initcall %s blacklisted\n", fn_name);

From 59dbb2a06fc2bcb752b964e036884fe9bb9dbbe0 Mon Sep 17 00:00:00 2001
From: Akash Goel <akash.goel@intel.com>
Date: Tue, 2 Aug 2016 14:07:18 -0700
Subject: [PATCH 106/111] relay: add global mode support for buffer-only
 channels

Commit 20d8b67c06fa ("relay: add buffer-only channels; useful for early
logging") added support to use channels with no associated files.

This is useful when the exact location of relay file is not known or the
the parent directory of relay file is not available, while creating the
channel and the logging has to start right from the boot.

But there was no provision to use global mode with buffer-only channels,
which is added by this patch, without modifying the interface where
initially there will be a dummy invocation of create_buf_file callback
through which kernel client can convey the need of a global buffer.

For the use case where drivers/kernel clients want a simple interface
for the userspace, which enables them to capture data/logs from relay
file inorder & without any post processing, support of Global buffer
mode is warranted.

Modules, like i915, using relay_open() in early init would have to later
register their buffer-only relays, once debugfs is available, by calling
relay_late_setup_files().  Hence relay_late_setup_files() symbol also
needs to be exported.

Link: http://lkml.kernel.org/r/1468404563-11653-1-git-send-email-akash.goel@intel.com
Signed-off-by: Akash Goel <akash.goel@intel.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/relay.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/kernel/relay.c b/kernel/relay.c
index 04d7cf3ef8cf..d797502140b9 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -451,6 +451,13 @@ static struct rchan_buf *relay_open_buf(struct rchan *chan, unsigned int cpu)
 		if (!dentry)
 			goto free_buf;
 		relay_set_buf_dentry(buf, dentry);
+	} else {
+		/* Only retrieve global info, nothing more, nothing less */
+		dentry = chan->cb->create_buf_file(NULL, NULL,
+						   S_IRUSR, buf,
+						   &chan->is_global);
+		if (WARN_ON(dentry))
+			goto free_buf;
 	}
 
  	buf->cpu = cpu;
@@ -562,6 +569,10 @@ static int relay_hotcpu_callback(struct notifier_block *nb,
  *	attributes specified.  The created channel buffer files
  *	will be named base_filename0...base_filenameN-1.  File
  *	permissions will be %S_IRUSR.
+ *
+ *	If opening a buffer (@parent = NULL) that you later wish to register
+ *	in a filesystem, call relay_late_setup_files() once the @parent dentry
+ *	is available.
  */
 struct rchan *relay_open(const char *base_filename,
 			 struct dentry *parent,
@@ -640,8 +651,12 @@ static void __relay_set_buf_dentry(void *info)
  *
  *	Returns 0 if successful, non-zero otherwise.
  *
- *	Use to setup files for a previously buffer-only channel.
- *	Useful to do early tracing in kernel, before VFS is up, for example.
+ *	Use to setup files for a previously buffer-only channel created
+ *	by relay_open() with a NULL parent dentry.
+ *
+ *	For example, this is useful for perfomring early tracing in kernel,
+ *	before VFS is up and then exposing the early results once the dentry
+ *	is available.
  */
 int relay_late_setup_files(struct rchan *chan,
 			   const char *base_filename,
@@ -666,6 +681,20 @@ int relay_late_setup_files(struct rchan *chan,
 	}
 	chan->has_base_filename = 1;
 	chan->parent = parent;
+
+	if (chan->is_global) {
+		err = -EINVAL;
+		if (!WARN_ON_ONCE(!chan->buf[0])) {
+			dentry = relay_create_buf_file(chan, chan->buf[0], 0);
+			if (dentry && !WARN_ON_ONCE(!chan->is_global)) {
+				relay_set_buf_dentry(chan->buf[0], dentry);
+				err = 0;
+			}
+		}
+		mutex_unlock(&relay_channels_mutex);
+		return err;
+	}
+
 	curr_cpu = get_cpu();
 	/*
 	 * The CPU hotplug notifier ran before us and created buffers with
@@ -706,6 +735,7 @@ int relay_late_setup_files(struct rchan *chan,
 
 	return err;
 }
+EXPORT_SYMBOL_GPL(relay_late_setup_files);
 
 /**
  *	relay_switch_subbuf - switch to a new sub-buffer

From ac3339baffd724edfb188ef57d1345d9649ba9af Mon Sep 17 00:00:00 2001
From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Tue, 2 Aug 2016 14:07:21 -0700
Subject: [PATCH 107/111] init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with
 allmodconfig

Doing patches with allmodconfig kernel compiled and committing stuff
into local tree have unfortunate consequence: kernel version changes (as
it should) leading to recompiling and relinking of several files even if
they weren't touched (or interesting at all).  This and "git-whatever"
figuring out current version slow down compilation for no good reason.

But lets face it, "allmodconfig" kernels don't care about kernel
version, they are simply compile check guinea pigs.

Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into
allmodconfig .config.

Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 init/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/init/Kconfig b/init/Kconfig
index 8f08f49a7c39..380798f86aae 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -81,6 +81,7 @@ config LOCALVERSION
 config LOCALVERSION_AUTO
 	bool "Automatically append version information to the version string"
 	default y
+	depends on !COMPILE_TEST
 	help
 	  This will try to automatically determine if the current tree is a
 	  release tree by looking for git tags that belong to the current

From 27eb6622ab67bad75814c9b7b08096cfb16be63a Mon Sep 17 00:00:00 2001
From: Rob Herring <robh@kernel.org>
Date: Tue, 2 Aug 2016 14:07:24 -0700
Subject: [PATCH 108/111] config: add android config fragments

Copy the config fragments from the AOSP common kernel android-4.4
branch.  It is becoming possible to run mainline kernels with Android,
but the kernel defconfigs don't work as-is and debugging missing config
options is a pain.  Adding the config fragments into the kernel tree,
makes configuring a mainline kernel as simple as:

  make ARCH=arm multi_v7_defconfig android-base.config android-recommended.config

The following non-upstream config options were removed:

  CONFIG_NETFILTER_XT_MATCH_QTAGUID
  CONFIG_NETFILTER_XT_MATCH_QUOTA2
  CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
  CONFIG_PPPOLAC
  CONFIG_PPPOPNS
  CONFIG_SECURITY_PERF_EVENTS_RESTRICT
  CONFIG_USB_CONFIGFS_F_MTP
  CONFIG_USB_CONFIGFS_F_PTP
  CONFIG_USB_CONFIGFS_F_ACC
  CONFIG_USB_CONFIGFS_F_AUDIO_SRC
  CONFIG_USB_CONFIGFS_UEVENT
  CONFIG_INPUT_KEYCHORD
  CONFIG_INPUT_KEYRESET

Link: http://lkml.kernel.org/r/1466708235-28593-1-git-send-email-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 MAINTAINERS                               |   5 +
 kernel/configs/android-base.config        | 152 ++++++++++++++++++++++
 kernel/configs/android-recommended.config | 121 +++++++++++++++++
 3 files changed, 278 insertions(+)
 create mode 100644 kernel/configs/android-base.config
 create mode 100644 kernel/configs/android-recommended.config

diff --git a/MAINTAINERS b/MAINTAINERS
index e9eacacf0f08..ce38536009c7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -778,6 +778,11 @@ W:	http://ez.analog.com/community/linux-device-drivers
 S:	Supported
 F:	drivers/dma/dma-axi-dmac.c
 
+ANDROID CONFIG FRAGMENTS
+M:	Rob Herring <robh@kernel.org>
+S:	Supported
+F:	kernel/configs/android*
+
 ANDROID DRIVERS
 M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 M:	Arve Hjønnevåg <arve@android.com>
diff --git a/kernel/configs/android-base.config b/kernel/configs/android-base.config
new file mode 100644
index 000000000000..9f748ed7bea8
--- /dev/null
+++ b/kernel/configs/android-base.config
@@ -0,0 +1,152 @@
+#  KEEP ALPHABETICALLY SORTED
+# CONFIG_DEVKMEM is not set
+# CONFIG_DEVMEM is not set
+# CONFIG_INET_LRO is not set
+# CONFIG_MODULES is not set
+# CONFIG_OABI_COMPAT is not set
+# CONFIG_SYSVIPC is not set
+CONFIG_ANDROID=y
+CONFIG_ANDROID_BINDER_IPC=y
+CONFIG_ANDROID_LOW_MEMORY_KILLER=y
+CONFIG_ARMV8_DEPRECATED=y
+CONFIG_ASHMEM=y
+CONFIG_AUDIT=y
+CONFIG_BLK_DEV_DM=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_CGROUPS=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_DEBUG=y
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_CP15_BARRIER_EMULATION=y
+CONFIG_DM_CRYPT=y
+CONFIG_DM_VERITY=y
+CONFIG_DM_VERITY_FEC=y
+CONFIG_EMBEDDED=y
+CONFIG_FB=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_INET6_AH=y
+CONFIG_INET6_ESP=y
+CONFIG_INET6_IPCOMP=y
+CONFIG_INET=y
+CONFIG_INET_DIAG_DESTROY=y
+CONFIG_INET_ESP=y
+CONFIG_INET_XFRM_MODE_TUNNEL=y
+CONFIG_IP6_NF_FILTER=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_MANGLE=y
+CONFIG_IP6_NF_RAW=y
+CONFIG_IP6_NF_TARGET_REJECT=y
+CONFIG_IPV6=y
+CONFIG_IPV6_MIP6=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_IPV6_PRIVACY=y
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_ROUTE_INFO=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_NF_ARPFILTER=y
+CONFIG_IP_NF_ARPTABLES=y
+CONFIG_IP_NF_ARP_MANGLE=y
+CONFIG_IP_NF_FILTER=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP_NF_MANGLE=y
+CONFIG_IP_NF_MATCH_AH=y
+CONFIG_IP_NF_MATCH_ECN=y
+CONFIG_IP_NF_MATCH_TTL=y
+CONFIG_IP_NF_NAT=y
+CONFIG_IP_NF_RAW=y
+CONFIG_IP_NF_SECURITY=y
+CONFIG_IP_NF_TARGET_MASQUERADE=y
+CONFIG_IP_NF_TARGET_NETMAP=y
+CONFIG_IP_NF_TARGET_REDIRECT=y
+CONFIG_IP_NF_TARGET_REJECT=y
+CONFIG_NET=y
+CONFIG_NETDEVICES=y
+CONFIG_NETFILTER=y
+CONFIG_NETFILTER_TPROXY=y
+CONFIG_NETFILTER_XT_MATCH_COMMENT=y
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_HELPER=y
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
+CONFIG_NETFILTER_XT_MATCH_LENGTH=y
+CONFIG_NETFILTER_XT_MATCH_LIMIT=y
+CONFIG_NETFILTER_XT_MATCH_MAC=y
+CONFIG_NETFILTER_XT_MATCH_MARK=y
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
+CONFIG_NETFILTER_XT_MATCH_POLICY=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA=y
+CONFIG_NETFILTER_XT_MATCH_SOCKET=y
+CONFIG_NETFILTER_XT_MATCH_STATE=y
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
+CONFIG_NETFILTER_XT_MATCH_STRING=y
+CONFIG_NETFILTER_XT_MATCH_TIME=y
+CONFIG_NETFILTER_XT_MATCH_U32=y
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
+CONFIG_NETFILTER_XT_TARGET_MARK=y
+CONFIG_NETFILTER_XT_TARGET_NFLOG=y
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
+CONFIG_NETFILTER_XT_TARGET_SECMARK=y
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
+CONFIG_NETFILTER_XT_TARGET_TPROXY=y
+CONFIG_NETFILTER_XT_TARGET_TRACE=y
+CONFIG_NET_CLS_ACT=y
+CONFIG_NET_CLS_U32=y
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_U32=y
+CONFIG_NET_KEY=y
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_HTB=y
+CONFIG_NF_CONNTRACK=y
+CONFIG_NF_CONNTRACK_AMANDA=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CONNTRACK_FTP=y
+CONFIG_NF_CONNTRACK_H323=y
+CONFIG_NF_CONNTRACK_IPV4=y
+CONFIG_NF_CONNTRACK_IPV6=y
+CONFIG_NF_CONNTRACK_IRC=y
+CONFIG_NF_CONNTRACK_NETBIOS_NS=y
+CONFIG_NF_CONNTRACK_PPTP=y
+CONFIG_NF_CONNTRACK_SANE=y
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_TFTP=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_NF_CT_PROTO_DCCP=y
+CONFIG_NF_CT_PROTO_SCTP=y
+CONFIG_NF_CT_PROTO_UDPLITE=y
+CONFIG_NF_NAT=y
+CONFIG_NO_HZ=y
+CONFIG_PACKET=y
+CONFIG_PM_AUTOSLEEP=y
+CONFIG_PM_WAKELOCKS=y
+CONFIG_PPP=y
+CONFIG_PPP_BSDCOMP=y
+CONFIG_PPP_DEFLATE=y
+CONFIG_PPP_MPPE=y
+CONFIG_PREEMPT=y
+CONFIG_QUOTA=y
+CONFIG_RTC_CLASS=y
+CONFIG_RT_GROUP_SCHED=y
+CONFIG_SECURITY=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_SECURITY_SELINUX=y
+CONFIG_SETEND_EMULATION=y
+CONFIG_STAGING=y
+CONFIG_SWP_EMULATION=y
+CONFIG_SYNC=y
+CONFIG_TUN=y
+CONFIG_UNIX=y
+CONFIG_USB_GADGET=y
+CONFIG_USB_CONFIGFS=y
+CONFIG_USB_CONFIGFS_F_FS=y
+CONFIG_USB_CONFIGFS_F_MIDI=y
+CONFIG_USB_OTG_WAKELOCK=y
+CONFIG_XFRM_USER=y
diff --git a/kernel/configs/android-recommended.config b/kernel/configs/android-recommended.config
new file mode 100644
index 000000000000..e3b953e966d2
--- /dev/null
+++ b/kernel/configs/android-recommended.config
@@ -0,0 +1,121 @@
+#  KEEP ALPHABETICALLY SORTED
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+# CONFIG_INPUT_MOUSE is not set
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_NF_CONNTRACK_SIP is not set
+# CONFIG_PM_WAKELOCKS_GC is not set
+# CONFIG_VT is not set
+CONFIG_BACKLIGHT_LCD_SUPPORT=y
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_COMPACTION=y
+CONFIG_DEBUG_RODATA=y
+CONFIG_DM_UEVENT=y
+CONFIG_DRAGONRISE_FF=y
+CONFIG_ENABLE_DEFAULT_TRACERS=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_SECURITY=y
+CONFIG_FUSE_FS=y
+CONFIG_GREENASIA_FF=y
+CONFIG_HIDRAW=y
+CONFIG_HID_A4TECH=y
+CONFIG_HID_ACRUX=y
+CONFIG_HID_ACRUX_FF=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_BELKIN=y
+CONFIG_HID_CHERRY=y
+CONFIG_HID_CHICONY=y
+CONFIG_HID_CYPRESS=y
+CONFIG_HID_DRAGONRISE=y
+CONFIG_HID_ELECOM=y
+CONFIG_HID_EMS_FF=y
+CONFIG_HID_EZKEY=y
+CONFIG_HID_GREENASIA=y
+CONFIG_HID_GYRATION=y
+CONFIG_HID_HOLTEK=y
+CONFIG_HID_KENSINGTON=y
+CONFIG_HID_KEYTOUCH=y
+CONFIG_HID_KYE=y
+CONFIG_HID_LCPOWER=y
+CONFIG_HID_LOGITECH=y
+CONFIG_HID_LOGITECH_DJ=y
+CONFIG_HID_MAGICMOUSE=y
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MONTEREY=y
+CONFIG_HID_MULTITOUCH=y
+CONFIG_HID_NTRIG=y
+CONFIG_HID_ORTEK=y
+CONFIG_HID_PANTHERLORD=y
+CONFIG_HID_PETALYNX=y
+CONFIG_HID_PICOLCD=y
+CONFIG_HID_PRIMAX=y
+CONFIG_HID_PRODIKEYS=y
+CONFIG_HID_ROCCAT=y
+CONFIG_HID_SAITEK=y
+CONFIG_HID_SAMSUNG=y
+CONFIG_HID_SMARTJOYPLUS=y
+CONFIG_HID_SONY=y
+CONFIG_HID_SPEEDLINK=y
+CONFIG_HID_SUNPLUS=y
+CONFIG_HID_THRUSTMASTER=y
+CONFIG_HID_TIVO=y
+CONFIG_HID_TOPSEED=y
+CONFIG_HID_TWINHAN=y
+CONFIG_HID_UCLOGIC=y
+CONFIG_HID_WACOM=y
+CONFIG_HID_WALTOP=y
+CONFIG_HID_WIIMOTE=y
+CONFIG_HID_ZEROPLUS=y
+CONFIG_HID_ZYDACRON=y
+CONFIG_INPUT_EVDEV=y
+CONFIG_INPUT_GPIO=y
+CONFIG_INPUT_JOYSTICK=y
+CONFIG_INPUT_MISC=y
+CONFIG_INPUT_TABLET=y
+CONFIG_INPUT_UINPUT=y
+CONFIG_ION=y
+CONFIG_JOYSTICK_XPAD=y
+CONFIG_JOYSTICK_XPAD_FF=y
+CONFIG_JOYSTICK_XPAD_LEDS=y
+CONFIG_KALLSYMS_ALL=y
+CONFIG_KSM=y
+CONFIG_LOGIG940_FF=y
+CONFIG_LOGIRUMBLEPAD2_FF=y
+CONFIG_LOGITECH_FF=y
+CONFIG_MD=y
+CONFIG_MEDIA_SUPPORT=y
+CONFIG_MSDOS_FS=y
+CONFIG_PANIC_TIMEOUT=5
+CONFIG_PANTHERLORD_FF=y
+CONFIG_PERF_EVENTS=y
+CONFIG_PM_DEBUG=y
+CONFIG_PM_RUNTIME=y
+CONFIG_PM_WAKELOCKS_LIMIT=0
+CONFIG_POWER_SUPPLY=y
+CONFIG_PSTORE=y
+CONFIG_PSTORE_CONSOLE=y
+CONFIG_PSTORE_RAM=y
+CONFIG_SCHEDSTATS=y
+CONFIG_SMARTJOYPLUS_FF=y
+CONFIG_SND=y
+CONFIG_SOUND=y
+CONFIG_SUSPEND_TIME=y
+CONFIG_TABLET_USB_ACECAD=y
+CONFIG_TABLET_USB_AIPTEK=y
+CONFIG_TABLET_USB_GTCO=y
+CONFIG_TABLET_USB_HANWANG=y
+CONFIG_TABLET_USB_KBTAB=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_TASK_XACCT=y
+CONFIG_TIMER_STATS=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+CONFIG_UHID=y
+CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_HIDDEV=y
+CONFIG_USB_USBNET=y
+CONFIG_VFAT_FS=y

From f1cb637e75b59a07450cf81ad68b04f3f46b03d7 Mon Sep 17 00:00:00 2001
From: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Date: Tue, 2 Aug 2016 14:07:27 -0700
Subject: [PATCH 109/111] init/Kconfig: add clarification for out-of-tree
 modules

It doesn't trim just symbols that are totally unused in-tree - it trims
the symbols unused by any in-tree modules actually built.  If you've
done a 'make localmodconfig' and only build a hundred or so modules,
it's pretty likely that your out-of-tree module will come up lacking
something...

Hopefully this will save the next guy from a Homer Simpson "D'oh!"
moment.

Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.edu
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 init/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 380798f86aae..69886493ff1e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2080,7 +2080,7 @@ config TRIM_UNUSED_KSYMS
 	  (especially when using LTO) for optimizing the code and reducing
 	  binary size.  This might have some security advantages as well.
 
-	  If unsure say N.
+	  If unsure, or if you need to build out-of-tree modules, say N.
 
 endif # MODULES
 

From a4691deabf284a601149a067525759939cc563b2 Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@oracle.com>
Date: Tue, 2 Aug 2016 14:07:30 -0700
Subject: [PATCH 110/111] kcov: allow more fine-grained coverage
 instrumentation

For more targeted fuzzing, it's better to disable kernel-wide
instrumentation and instead enable it on a per-subsystem basis.  This
follows the pattern of UBSAN and allows you to compile in the kcov
driver without instrumenting the whole kernel.

To instrument a part of the kernel, you can use either

    # for a single file in the current directory
    KCOV_INSTRUMENT_filename.o := y

or

    # for all the files in the current directory (excluding subdirectories)
    KCOV_INSTRUMENT := y

or

    # (same as above)
    ccflags-y += $(CFLAGS_KCOV)

or

    # for all the files in the current directory (including subdirectories)
    subdir-ccflags-y += $(CFLAGS_KCOV)

Link: http://lkml.kernel.org/r/1464008380-11405-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 lib/Kconfig.debug    | 11 +++++++++++
 scripts/Makefile.lib |  2 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f07842e2d69f..cc02f282d05b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -719,6 +719,17 @@ config KCOV
 
 	  For more details, see Documentation/kcov.txt.
 
+config KCOV_INSTRUMENT_ALL
+	bool "Instrument all code by default"
+	depends on KCOV
+	default y if KCOV
+	help
+	  If you are doing generic system call fuzzing (like e.g. syzkaller),
+	  then you will want to instrument the whole kernel and you should
+	  say y here. If you are doing more targeted fuzzing (like e.g.
+	  filesystem fuzzing with AFL) then you will want to enable coverage
+	  for more specific subsets of files, and should say n here.
+
 config DEBUG_SHIRQ
 	bool "Debug shared IRQ handlers"
 	depends on DEBUG_KERNEL
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index e7df0f5db7ec..76494e15417b 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -138,7 +138,7 @@ endif
 
 ifeq ($(CONFIG_KCOV),y)
 _c_flags += $(if $(patsubst n%,, \
-	$(KCOV_INSTRUMENT_$(basetarget).o)$(KCOV_INSTRUMENT)y), \
+	$(KCOV_INSTRUMENT_$(basetarget).o)$(KCOV_INSTRUMENT)$(CONFIG_KCOV_INSTRUMENT_ALL)), \
 	$(CFLAGS_KCOV))
 endif
 

From 3bd080e4d8f2351ee3e143f0ec9307cc95ae6639 Mon Sep 17 00:00:00 2001
From: Alexey Dobriyan <adobriyan@gmail.com>
Date: Tue, 2 Aug 2016 14:07:32 -0700
Subject: [PATCH 111/111] ipc: delete "nr_ipc_ns"

Write-only variable.

Link: http://lkml.kernel.org/r/20160708214356.GA6785@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/ipc_namespace.h | 2 --
 ipc/msgutil.c                 | 2 --
 ipc/namespace.c               | 2 --
 3 files changed, 6 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 1eee6bcfcf76..d10e54f03c09 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -63,8 +63,6 @@ struct ipc_namespace {
 };
 
 extern struct ipc_namespace init_ipc_ns;
-extern atomic_t nr_ipc_ns;
-
 extern spinlock_t mq_lock;
 
 #ifdef CONFIG_SYSVIPC
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index ed81aafd2392..a521999de4f1 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -37,8 +37,6 @@ struct ipc_namespace init_ipc_ns = {
 #endif
 };
 
-atomic_t nr_ipc_ns = ATOMIC_INIT(1);
-
 struct msg_msgseg {
 	struct msg_msgseg *next;
 	/* the next part of the message follows immediately */
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 04cb07eb81f1..d87e6baa1323 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -43,7 +43,6 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 		kfree(ns);
 		return ERR_PTR(err);
 	}
-	atomic_inc(&nr_ipc_ns);
 
 	sem_init_ns(ns);
 	msg_init_ns(ns);
@@ -96,7 +95,6 @@ static void free_ipc_ns(struct ipc_namespace *ns)
 	sem_exit_ns(ns);
 	msg_exit_ns(ns);
 	shm_exit_ns(ns);
-	atomic_dec(&nr_ipc_ns);
 
 	put_user_ns(ns->user_ns);
 	ns_free_inum(&ns->ns);