The existing SUID_DUMP_* defines duplicate the newer SUID_DUMPABLE_*
defines introduced in 54b501992d ("coredump: warn about unsafe
suid_dumpable / core_pattern combo"). Remove the new ones, and use the
prior values instead.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Chen Gang <gang.chen@asianux.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several printk's were missing KERN_INFO and KERN_CONT flags. In
addition, a printk that was outside a #if/#endif should have been
inside, which would result in stray blank line on non-x86 boxes.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The idea is simple. We need to get the siginfo for each signal on
checkpointing dump, and then return it back on restore.
The first problem is that the kernel doesn't report complete siginfos to
userspace. In a signal handler the kernel strips SI_CODE from siginfo.
When a siginfo is received from signalfd, it has a different format with
fixed sizes of fields. The interface of signalfd was extended. If a
signalfd is created with the flag SFD_RAW, it returns siginfo in a raw
format.
rt_sigqueueinfo looks suitable for restoring signals, but it can't send
siginfo with a positive si_code, because these codes are reserved for
the kernel. In the real world each person has right to do anything with
himself, so I think a process should able to send any siginfo to itself.
This patch:
The kernel prevents sending of siginfo with positive si_code, because
these codes are reserved for kernel. I think we can allow a task to
send such a siginfo to itself. This operation should not be dangerous.
This functionality is required for restoring signals in
checkpoint/restart.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__orderly_poweroff() does argv_free() if call_usermodehelper_fns()
returns -ENOMEM. As Lucas pointed out, this can be wrong if -ENOMEM was
not triggered by the failing call_usermodehelper_setup(), in this case
both __orderly_poweroff() and argv_cleanup() can do kfree().
Kill argv_cleanup() and change __orderly_poweroff() to call argv_free()
unconditionally like do_coredump() does. This info->cleanup() is not
needed (and wrong) since 6c0c0d4d "fix bug in orderly_poweroff() which
did the UMH_NO_WAIT => UMH_WAIT_EXEC change, we can rely on the fact
that CLONE_VFORK can't return until do_execve() succeeds/fails.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: hongfeng <hongfeng@marvell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
Pull scheduler fixes from Ingo Molnar.
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cputime: Use local_clock() for full dynticks cputime accounting
cputime: Constify timeval_to_cputime(timeval) argument
sched: Move RR_TIMESLICE from sysctl.h to rt.h
sched: Fix /proc/sched_debug failure on very very large systems
sched: Fix /proc/sched_stat failure on very very large systems
sched/core: Remove the obsolete and unused nr_uninterruptible() function
Pull drm merge from Dave Airlie:
"Highlights:
- TI LCD controller KMS driver
- TI OMAP KMS driver merged from staging
- drop gma500 stub driver
- the fbcon locking fixes
- the vgacon dirty like zebra fix.
- open firmware videomode and hdmi common code helpers
- major locking rework for kms object handling - pageflip/cursor
won't block on polling anymore!
- fbcon helper and prime helper cleanups
- i915: all over the map, haswell power well enhancements, valleyview
macro horrors cleaned up, killing lots of legacy GTT code,
- radeon: CS ioctl unification, deprecated UMS support, gpu reset
rework, VM fixes
- nouveau: reworked thermal code, external dp/tmds encoder support
(anx9805), fences sleep instead of polling,
- exynos: all over the driver fixes."
Lovely conflict in radeon/evergreen_cs.c between commit de0babd60d
("drm/radeon: enforce use of radeon_get_ib_value when reading user cmd")
and the new changes that modified that evergreen_dma_cs_parse()
function.
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (508 commits)
drm/tilcdc: only build on arm
drm/i915: Revert hdmi HDP pin checks
drm/tegra: Add list of framebuffers to debugfs
drm/tegra: Fix color expansion
drm/tegra: Split DC_CMD_STATE_CONTROL register write
drm/tegra: Implement page-flipping support
drm/tegra: Implement VBLANK support
drm/tegra: Implement .mode_set_base()
drm/tegra: Add plane support
drm/tegra: Remove bogus tegra_framebuffer structure
drm: Add consistency check for page-flipping
drm/radeon: Use generic HDMI infoframe helpers
drm/tegra: Use generic HDMI infoframe helpers
drm: Add EDID helper documentation
drm: Add HDMI infoframe helpers
video: Add generic HDMI infoframe helpers
drm: Add some missing forward declarations
drm: Move mode tables to drm_edid.c
drm: Remove duplicate drm_mode_cea_vic()
gma500: Fix n, m1 and m2 clock limits for sdvo and lvds
...
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
"This set of changes starts with a few small enhnacements to the user
namespace. reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
user namespace root.
I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support
enabled you will need to enable memory control groups.
There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.
The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems. These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use. The filesystems converted for
3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
changes for these filesystems were a little more involved so I split
the changes into smaller hopefully obviously correct changes.
XFS is the only filesystem that remains. I was hoping I could get
that in this release so that user namespace support would be enabled
with an allyesconfig or an allmodconfig but it looks like the xfs
changes need another couple of days before it they are ready."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
cifs: Enable building with user namespaces enabled.
cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
cifs: Convert struct cifs_sb_info to use kuids and kgids
cifs: Modify struct smb_vol to use kuids and kgids
cifs: Convert struct cifsFileInfo to use a kuid
cifs: Convert struct cifs_fattr to use kuid and kgids
cifs: Convert struct tcon_link to use a kuid.
cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
cifs: Convert from a kuid before printing current_fsuid
cifs: Use kuids and kgids SID to uid/gid mapping
cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
cifs: Override unmappable incoming uids and gids
nfsd: Enable building with user namespaces enabled.
nfsd: Properly compare and initialize kuids and kgids
nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
nfsd: Modify nfsd4_cb_sec to use kuids and kgids
nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
nfsd: Convert nfsxdr to use kuids and kgids
nfsd: Convert nfs3xdr to use kuids and kgids
...
lockdep, but it's a mechanical change.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRJAcuAAoJENkgDmzRrbjxsw0P/3eXb+LddYnx0V0uHYdKpCUf
4vdW7X0fX3Z+aUK69IWRL/6ahoO4TpaHYGHBDjEoivyQ0GDq14X7JNWsYYt3LdMf
3wmDgRc2cn/mZOJbFeVpNV8ox5l/xc0CUvV+iQ8tMjfQItXMXgWUFZKMECsXKSO6
eex3lrw9M2jAX2uL8LQPp9W8xtKu24nSZRC6tH5riE/8fCzi1cZPPAqfxP5c8Lee
ZXtbCRSyAFENZLpKyMe1PC7HvtJyi5NDn9xwOQiXULZV/VOlvP94DGBLIKCM/6dn
4QvZxpG0P0uOlpCgRAVLyh/z7g4XY4VF/fHopLCmEcqLsvgD+V2LQpQ9zWUalLPC
Z+pUpz2vu0gIddPU1nR8R6oGpEdJ8O12aJle62p/RSXWZGx12qUQ+Tamu0tgKcv1
AsiJfbUGNDYfxgU6sHsoQjl2f68LTVckCU1C1LqEbW/S104EIORtGx30CHM4LRiO
32kDC5TtgYDBKQAIqJ4bL48ZMh+9W3uX40p7xzOI5khHQjvswUKa3jcxupU0C1uv
lx8KXo7pn8WT33QGysWC782wJCgJuzSc2vRn+KQoqoynuHGM6agaEtR59gil3QWO
rQEcxH63BBRDgHlg4FM9IkJwwsnC3PWKL8gbX0uAWXAPMbgapJkuuGZAwt0WDGVK
+GszxsFkCjlW0mK0egTb
=tiSY
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
"The sweeping change is to make add_taint() explicitly indicate whether
to disable lockdep, but it's a mechanical change."
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Add option to not sign modules during modules_install
MODSIGN: Add -s <signature> option to sign-file
MODSIGN: Specify the hash algorithm on sign-file command line
MODSIGN: Simplify Makefile with a Kconfig helper
module: clean up load_module a little more.
modpost: Ignore ARC specific non-alloc sections
module: constify within_module_*
taint: add explicit flag to show whether lock dep is still OK.
module: printk message when module signature fail taints kernel.
Running the full dynticks cputime accounting with preemptible
kernel debugging trigger the following warning:
[ 4.488303] BUG: using smp_processor_id() in preemptible [00000000] code: init/1
[ 4.490971] caller is native_sched_clock+0x22/0x80
[ 4.493663] Pid: 1, comm: init Not tainted 3.8.0+ #13
[ 4.496376] Call Trace:
[ 4.498996] [<ffffffff813410eb>] debug_smp_processor_id+0xdb/0xf0
[ 4.501716] [<ffffffff8101e642>] native_sched_clock+0x22/0x80
[ 4.504434] [<ffffffff8101db99>] sched_clock+0x9/0x10
[ 4.507185] [<ffffffff81096ccd>] fetch_task_cputime+0xad/0x120
[ 4.509916] [<ffffffff81096dd5>] task_cputime+0x35/0x60
[ 4.512622] [<ffffffff810f146e>] acct_update_integrals+0x1e/0x40
[ 4.515372] [<ffffffff8117d2cf>] do_execve_common+0x4ff/0x5c0
[ 4.518117] [<ffffffff8117cf14>] ? do_execve_common+0x144/0x5c0
[ 4.520844] [<ffffffff81867a10>] ? rest_init+0x160/0x160
[ 4.523554] [<ffffffff8117d457>] do_execve+0x37/0x40
[ 4.526276] [<ffffffff810021a3>] run_init_process+0x23/0x30
[ 4.528953] [<ffffffff81867aac>] kernel_init+0x9c/0xf0
[ 4.531608] [<ffffffff8188356c>] ret_from_fork+0x7c/0xb0
We use sched_clock() to perform and fixup the cputime
accounting. However we are calling it with preemption enabled
from the read side, which trigger the bug above.
To fix this up, use local_clock() instead. It takes care of
preemption and also provide a more reliable clock source. This
is welcome for this kind of statistic that is widely relied on
in userspace.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1361636925-22288-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull signal handling cleanups from Al Viro:
"This is the first pile; another one will come a bit later and will
contain SYSCALL_DEFINE-related patches.
- a bunch of signal-related syscalls (both native and compat)
unified.
- a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
(fixing several potential problems with missing argument
validation, while we are at it)
- a lot of now-pointless wrappers killed
- a couple of architectures (cris and hexagon) forgot to save
altstack settings into sigframe, even though they used the
(uninitialized) values in sigreturn; fixed.
- microblaze fixes for delivery of multiple signals arriving at once
- saner set of helpers for signal delivery introduced, several
architectures switched to using those."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
x86: convert to ksignal
sparc: convert to ksignal
arm: switch to struct ksignal * passing
alpha: pass k_sigaction and siginfo_t using ksignal pointer
burying unused conditionals
make do_sigaltstack() static
arm64: switch to generic old sigaction() (compat-only)
arm64: switch to generic compat rt_sigaction()
arm64: switch compat to generic old sigsuspend
arm64: switch to generic compat rt_sigqueueinfo()
arm64: switch to generic compat rt_sigpending()
arm64: switch to generic compat rt_sigprocmask()
arm64: switch to generic sigaltstack
sparc: switch to generic old sigsuspend
sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
sparc: kill sign-extending wrappers for native syscalls
kill sparc32_open()
sparc: switch to use of generic old sigaction
sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
mips: switch to generic sys_fork() and sys_clone()
...
Merge second patch-bomb from Andrew Morton:
- A little DM fix
- the MM queue
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
ksm: allocate roots when needed
mm: cleanup "swapcache" in do_swap_page
mm,ksm: swapoff might need to copy
mm,ksm: FOLL_MIGRATION do migration_entry_wait
ksm: shrink 32-bit rmap_item back to 32 bytes
ksm: treat unstable nid like in stable tree
ksm: add some comments
tmpfs: fix mempolicy object leaks
tmpfs: fix use-after-free of mempolicy object
mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
mm: export mmu notifier invalidates
mm: accelerate mm_populate() treatment of THP pages
mm: use long type for page counts in mm_populate() and get_user_pages()
mm: accurately document nr_free_*_pages functions with code comments
HWPOISON: change order of error_states[]'s elements
HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
memcg: stop warning on memcg_propagate_kmem
net: change type of virtio_chan->p9_max_pages
vmscan: change type of vm_total_pages to unsigned long
fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
...
When calculating amount of dirtyable memory, min_free_kbytes should be
subtracted because it is not intended for dirty pages.
Addresses http://bugs.debian.org/695182
[akpm@linux-foundation.org: fix up min_free_kbytes extern declarations]
[akpm@linux-foundation.org: fix min() warning]
Signed-off-by: Paul Szabo <psz@maths.usyd.edu.au>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu)
will return -1. As a result, cpumask_of_node(nid) will return NULL. In
this case, find_next_bit() in for_each_cpu will get a NULL pointer and
cause panic.
Here is a call trace:
Call Trace:
<IRQ>
select_fallback_rq+0x71/0x190
try_to_wake_up+0x2cb/0x2f0
wake_up_process+0x15/0x20
hrtimer_wakeup+0x22/0x30
__run_hrtimer+0x83/0x320
hrtimer_interrupt+0x106/0x280
smp_apic_timer_interrupt+0x69/0x99
apic_timer_interrupt+0x6f/0x80
There is a hrtimer process sleeping, whose cpu has already been
offlined. When it is waken up, it tries to find another cpu to run, and
get a -1 nid. As a result, cpumask_of_node(-1) returns NULL, and causes
ernel panic.
This patch fixes this problem by judging if the nid is -1. If nid is
not -1, a cpu on the same node will be picked. Else, a online cpu on
another node will be picked.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core locking changes from Ingo Molnar:
"The biggest change is the rwsem lock-steal improvements, both to the
assembly optimized and the spinlock based variants.
The other notable change is the clean up of the seqlock implementation
to be based on the seqcount infrastructure.
The rest is assorted smaller debuggability, cleanup and continued -rt
locking changes."
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rwsem-spinlock: Implement writer lock-stealing for better scalability
futex: Revert "futex: Mark get_robust_list as deprecated"
generic: Use raw local irq variant for generic cmpxchg
lockdep: Selftest: convert spinlock to raw spinlock
seqlock: Use seqcount infrastructure
seqlock: Remove unused functions
ntp: Make ntp_lock raw
intel_idle: Convert i7300_idle_lock to raw_spinlock
locking: Various static lock initializer fixes
lockdep: Print more info when MAX_LOCK_DEPTH is exceeded
rwsem: Implement writer lock-stealing for better scalability
lockdep: Silence warning if CONFIG_LOCKDEP isn't set
watchdog: Use local_clock for get_timestamp()
lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug()
locking/stat: Fix a typo
On systems with 4096 cores attemping to read /proc/sched_debug
fails because we are trying to push all the data into a single
kmalloc buffer.
The issue is on these very large machines all the data will not
fit in 4mb.
A better solution is to not us the single_open mechanism but to
provide our own seq_operations and treat each cpu as an
individual record.
The output should be identical to the previous version.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>)
[ Whitespace fixlet]
[ Fix spello in comment]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On systems with 4096 cores doing a cat /proc/sched_stat fails,
because we are trying to push all the data into a single kmalloc
buffer.
The issue is on these very large machines all the data will not
fit in 4mb.
A better solution is to not use the single_open() mechanism but
to provide our own seq_operations.
The output should be identical to previous version and thus not
need the version number.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
[ Fix memleak]
[ Fix spello in comment]
[ Fix warnings]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 mm changes from Peter Anvin:
"This is a huge set of several partly interrelated (and concurrently
developed) changes, which is why the branch history is messier than
one would like.
The *really* big items are two humonguous patchsets mostly developed
by Yinghai Lu at my request, which completely revamps the way we
create initial page tables. In particular, rather than estimating how
much memory we will need for page tables and then build them into that
memory -- a calculation that has shown to be incredibly fragile -- we
now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
a #PF handler which creates temporary page tables on demand.
This has several advantages:
1. It makes it much easier to support things that need access to data
very early (a followon patchset uses this to load microcode way
early in the kernel startup).
2. It allows the kernel and all the kernel data objects to be invoked
from above the 4 GB limit. This allows kdump to work on very large
systems.
3. It greatly reduces the difference between Xen and native (Xen's
equivalent of the #PF handler are the temporary page tables created
by the domain builder), eliminating a bunch of fragile hooks.
The patch series also gets us a bit closer to W^X.
Additional work in this pull is the 64-bit get_user() work which you
were also involved with, and a bunch of cleanups/speedups to
__phys_addr()/__pa()."
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits)
x86, mm: Move reserving low memory later in initialization
x86, doc: Clarify the use of asm("%edx") in uaccess.h
x86, mm: Redesign get_user with a __builtin_choose_expr hack
x86: Be consistent with data size in getuser.S
x86, mm: Use a bitfield to mask nuisance get_user() warnings
x86/kvm: Fix compile warning in kvm_register_steal_time()
x86-32: Add support for 64bit get_user()
x86-32, mm: Remove reference to alloc_remap()
x86-32, mm: Remove reference to resume_map_numa_kva()
x86-32, mm: Rip out x86_32 NUMA remapping code
x86/numa: Use __pa_nodebug() instead
x86: Don't panic if can not alloc buffer for swiotlb
mm: Add alloc_bootmem_low_pages_nopanic()
x86, 64bit, mm: hibernate use generic mapping_init
x86, 64bit, mm: Mark data/bss/brk to nx
x86: Merge early kernel reserve for 32bit and 64bit
x86: Add Crash kernel low reservation
x86, kdump: Remove crashkernel range find limit for 64bit
memblock: Add memblock_mem_size()
x86, boot: Not need to check setup_header version for setup_data
...
Merge misc patches from Andrew Morton:
- Florian has vanished so I appear to have become fbdev maintainer
again :(
- Joel and Mark are distracted to welcome to the new OCFS2 maintainer
- The backlight queue
- Small core kernel changes
- lib/ updates
- The rtc queue
- Various random bits
* akpm: (164 commits)
rtc: rtc-davinci: use devm_*() functions
rtc: rtc-max8997: use devm_request_threaded_irq()
rtc: rtc-max8907: use devm_request_threaded_irq()
rtc: rtc-da9052: use devm_request_threaded_irq()
rtc: rtc-wm831x: use devm_request_threaded_irq()
rtc: rtc-tps80031: use devm_request_threaded_irq()
rtc: rtc-lp8788: use devm_request_threaded_irq()
rtc: rtc-coh901331: use devm_clk_get()
rtc: rtc-vt8500: use devm_*() functions
rtc: rtc-tps6586x: use devm_request_threaded_irq()
rtc: rtc-imxdi: use devm_clk_get()
rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug()
rtc: rtc-pcf8583: use dev_warn() instead of printk()
rtc: rtc-sun4v: use pr_warn() instead of printk()
rtc: rtc-vr41xx: use dev_info() instead of printk()
rtc: rtc-rs5c313: use pr_err() instead of printk()
rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug()
rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug()
rtc: rtc-ds2404: use dev_err() instead of printk()
rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk()
...
We can use user_ns, which is also assigned from task_cred_xxx(tsk,
user_ns), at the beginning of copy_namespaces().
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove a tabstop from the switch statement, in the usual fashion. A few
instances of weirdwrapping were removed as a result.
Cc: Chen Gang <gang.chen@asianux.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arg2 will never < 0, for its type is 'unsigned long'
Also, use the provided macros.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm testing swapout workload in a two-socket Xeon machine. The workload
has 10 threads, each thread sequentially accesses separate memory
region. TLB flush overhead is very big in the workload. For each page,
page reclaim need move it from active lru list and then unmap it. Both
need a TLB flush. And this is a multthread workload, TLB flush happens
in 10 CPUs. In X86, TLB flush uses generic smp_call)function. So this
workload stress smp_call_function_many heavily.
Without patch, perf shows:
+ 24.49% [k] generic_smp_call_function_interrupt
- 21.72% [k] _raw_spin_lock
- _raw_spin_lock
+ 79.80% __page_check_address
+ 6.42% generic_smp_call_function_interrupt
+ 3.31% get_swap_page
+ 2.37% free_pcppages_bulk
+ 1.75% handle_pte_fault
+ 1.54% put_super
+ 1.41% grab_super_passive
+ 1.36% __swap_duplicate
+ 0.68% blk_flush_plug_list
+ 0.62% swap_info_get
+ 6.55% [k] flush_tlb_func
+ 6.46% [k] smp_call_function_many
+ 5.09% [k] call_function_interrupt
+ 4.75% [k] default_send_IPI_mask_sequence_phys
+ 2.18% [k] find_next_bit
swapout throughput is around 1300M/s.
With the patch, perf shows:
- 27.23% [k] _raw_spin_lock
- _raw_spin_lock
+ 80.53% __page_check_address
+ 8.39% generic_smp_call_function_single_interrupt
+ 2.44% get_swap_page
+ 1.76% free_pcppages_bulk
+ 1.40% handle_pte_fault
+ 1.15% __swap_duplicate
+ 1.05% put_super
+ 0.98% grab_super_passive
+ 0.86% blk_flush_plug_list
+ 0.57% swap_info_get
+ 8.25% [k] default_send_IPI_mask_sequence_phys
+ 7.55% [k] call_function_interrupt
+ 7.47% [k] smp_call_function_many
+ 7.25% [k] flush_tlb_func
+ 3.81% [k] _raw_spin_lock_irqsave
+ 3.78% [k] generic_smp_call_function_single_interrupt
swapout throughput is around 1400M/s. So there is around a 7%
improvement, and total cpu utilization doesn't change.
Without the patch, cfd_data is shared by all CPUs.
generic_smp_call_function_interrupt does read/write cfd_data several times
which will create a lot of cache ping-pong. With the patch, the data
becomes per-cpu. The ping-pong is avoided. And from the perf data, this
doesn't make call_single_queue lock contend.
Next step is to remove generic_smp_call_function_interrupt() from arch
code.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
How is the compiler even handling exported functions that are marked
inline? Anyway, these shouldn't be inline because of that, so remove
that marking.
Based on a larger patch by Mark Charlebois to get LLVM to build the
kernel.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Charlebois <mcharleb@qualcomm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: hank <pyu@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The copy_to_user() call returns the number of bytes remaining but we
want to return -EFAULT on error.
Fixes "x32: fix waitid()"
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A large number of cleanups, all over the platforms. This is dominated
largely by the Samsung platforms (s3c, s5p, exynos) and a few of the
others moving code out of arch/arm into more appropriate subsystems.
The clocksource and irqchip drivers are now abstracted to the point
where platforms that are already cleaned up do not need to even specify
the driver they use, it can all get configured from the device tree
as we do for normal device drivers. The clocksource changes basically
touch every single platform in the process.
We further clean up the use of platform specific header files here,
with the goal of turning more of the platforms over to being
"multiplatform" enabled, which implies that they cannot expose
their headers to architecture independent code any more.
It is expected that no functional changes are part of the cleanup.
The overall reduction in total code lines is mostly the result of
removing broken and obsolete code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAUSUyKmCrR//JCVInAQIN8RAAnb/uPytmlMjn5yCksF4Mvb/FVbn/TVwz
KRIGpCHOzyKK1q7pM8NRUVWfjW2SZqbXJFqx6zBGKSlDPvFTOhsLyyupU+Tnyu5W
IX4eIUBwb+a6H7XDHw0X2YI8uHzi5RNLhne0A1QyDKcnuHs1LDAttXnJHaK4Ap6Y
NN2YFt3l3ld7DXWXJtMsw5v8lC10aeIFGTvXefaPDAdeMLivmI57qEUMDXknNr7W
Odz/Rc0/cw3BNBVl/zNHA0jw7FOjKAymCYYNUa4xDCJEr+JnIRTqizd0N/YIIC7x
aA2xjJ3oKUFyF51yiJE6nFuTyJznhwtehc+uiMOSIkjrPLym52LEHmd7G5Yqlmjz
oiei09qBb870q3lGxwfht9iaeIwYgQFYGfD0yW5QWArCO5pxhtCPLPH7YZNZtcQd
ZJRSGGqT/ljBz3bm0K9OLESeeTTN7+Nxvtpiz/CD+Piegz0gWJzDYJRTzkJ3UWpA
WTVhVQdWUeX2JrNkgM7Z3Tu8iXOe+LIEs7kVXGJZSREmIIZiRvR36UrODZtAkp9I
7YQ+srX/uaR832pgK0RrHK0zY0psU6MmIvhYxJZFbx7keiPA9eH6drb0x7tGqcUD
FzEUzvcZvyqppndfBi+R60H/YKAhJDEXdwxzo6dyCpPQaW1T9GnzIqXuE1zin+Aw
X7Y8YywMbHI=
=DvgJ
-----END PGP SIGNATURE-----
Merge tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC cleanups from Arnd Bergmann:
"A large number of cleanups, all over the platforms. This is dominated
largely by the Samsung platforms (s3c, s5p, exynos) and a few of the
others moving code out of arch/arm into more appropriate subsystems.
The clocksource and irqchip drivers are now abstracted to the point
where platforms that are already cleaned up do not need to even
specify the driver they use, it can all get configured from the device
tree as we do for normal device drivers. The clocksource changes
basically touch every single platform in the process.
We further clean up the use of platform specific header files here,
with the goal of turning more of the platforms over to being
"multiplatform" enabled, which implies that they cannot expose their
headers to architecture independent code any more.
It is expected that no functional changes are part of the cleanup.
The overall reduction in total code lines is mostly the result of
removing broken and obsolete code."
* tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (133 commits)
ARM: mvebu: correct gated clock documentation
ARM: kirkwood: add missing include for nsa310
ARM: exynos: move exynos4210-combiner to drivers/irqchip
mfd: db8500-prcmu: update resource passing
drivers/db8500-cpufreq: delete dangling include
ARM: at91: remove NEOCORE 926 board
sunxi: Cleanup the reset code and add meaningful registers defines
ARM: S3C24XX: header mach/regs-mem.h local
ARM: S3C24XX: header mach/regs-power.h local
ARM: S3C24XX: header mach/regs-s3c2412-mem.h local
ARM: S3C24XX: Remove plat-s3c24xx directory in arch/arm/
ARM: S3C24XX: transform s3c2443 subirqs into new structure
ARM: S3C24XX: modify s3c2443 irq init to initialize all irqs
ARM: S3C24XX: move s3c2443 irq code to irq.c
ARM: S3C24XX: transform s3c2416 irqs into new structure
ARM: S3C24XX: modify s3c2416 irq init to initialize all irqs
ARM: S3C24XX: move s3c2416 irq init to common irq code
ARM: S3C24XX: Modify s3c_irq_wake to use the hwirq property
ARM: S3C24XX: Move irq syscore-ops to irq-pm
clocksource: always define CLOCKSOURCE_OF_DECLARE
...
Here's the big tty/serial driver patches for 3.9-rc1.
More tty port rework and fixes from Jiri here, as well as lots of
individual serial driver updates and fixes.
All of these have been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlEmZYQACgkQMUfUDdst+ylJDgCg0B0nMevUUdM4hLvxunbbiyXM
HUEAoIOedqriNNPvX4Bwy0hjeOEaWx0g
=vi6x
-----END PGP SIGNATURE-----
Merge tag 'tty-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial patches from Greg Kroah-Hartman:
"Here's the big tty/serial driver patches for 3.9-rc1.
More tty port rework and fixes from Jiri here, as well as lots of
individual serial driver updates and fixes.
All of these have been in the linux-next tree for a while."
* tag 'tty-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits)
tty: mxser: improve error handling in mxser_probe() and mxser_module_init()
serial: imx: fix uninitialized variable warning
serial: tegra: assume CONFIG_OF
TTY: do not update atime/mtime on read/write
lguest: select CONFIG_TTY to build properly.
ARM defconfigs: add missing inclusions of linux/platform_device.h
fb/exynos: include platform_device.h
ARM: sa1100/assabet: include platform_device.h directly
serial: imx: Fix recursive locking bug
pps: Fix build breakage from decoupling pps from tty
tty: Remove ancient hardpps()
pps: Additional cleanups in uart_handle_dcd_change
pps: Move timestamp read into PPS code proper
pps: Don't crash the machine when exiting will do
pps: Fix a use-after free bug when unregistering a source.
pps: Use pps_lookup_dev to reduce ldisc coupling
pps: Add pps_lookup_dev() function
tty: serial: uartlite: Support uartlite on big and little endian systems
tty: serial: uartlite: Fix sparse and checkpatch warnings
serial/arc-uart: Miscll DT related updates (Grant's review comments)
...
Fix up trivial conflicts, mostly just due to the TTY config option
clashing with the EXPERIMENTAL removal.
Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers all
over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
If you need me to provide a merged tree to handle these resolutions,
please let me know.
Other than those patches, there's not much here, some minor fixes and
updates.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
=yWAQ
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
Other than those patches, there's not much here, some minor fixes and
updates"
Fix up trivial conflicts
* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...
When idr_find() was fed a negative ID, it used to look up the ID
ignoring the sign bit before recent ("idr: remove MAX_IDR_MASK and
move left MAX_IDR_* into idr.c") patch. Now a negative ID triggers
a WARN_ON_ONCE().
__lock_timer() feeds timer_id from userland directly to idr_find()
without sanitizing it which can trigger the above malfunctions. Add a
range check on @timer_id before invoking idr_find() in __lock_timer().
While timer_t is defined as int by all archs at the moment, Andrew
worries that it may be defined as a larger type later on. Make the
test cover larger integers too so that it at least is guaranteed to
not return the wrong timer.
Note that WARN_ON_ONCE() in idr_find() on id < 0 is transitional
precaution while moving away from ignoring MSB. Once it's gone we can
remove the guard as long as timer_t isn't larger than int.
Signed-off-by: Tejun Heo <tj@kernel.org>nnn
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20130220232412.GL3570@htj.dyndns.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull networking update from David Miller:
1) Checkpoint/restarted TCP sockets now can properly propagate the TCP
timestamp offset. From Andrey Vagin.
2) VMWARE VM VSOCK layer, from Andy King.
3) Much improved support for virtual functions and SR-IOV in bnx2x,
from Ariel ELior.
4) All protocols on ipv4 and ipv6 are now network namespace aware, and
all the compatability checks for initial-namespace-only protocols is
removed. Thanks to Tom Parkin for helping deal with the last major
holdout, L2TP.
5) IPV6 support in netpoll and network namespace support in pktgen,
from Cong Wang.
6) Multiple Registration Protocol (MRP) and Multiple VLAN Registration
Protocol (MVRP) support, from David Ward.
7) Compute packet lengths more accurately in the packet scheduler, from
Eric Dumazet.
8) Use per-task page fragment allocator in skb_append_datato_frags(),
also from Eric Dumazet.
9) Add support for connection tracking labels in netfilter, from
Florian Westphal.
10) Fix default multicast group joining on ipv6, and add anti-spoofing
checks to 6to4 and 6rd. From Hannes Frederic Sowa.
11) Make ipv4/ipv6 fragmentation memory limits more reasonable in modern
times, rearrange inet frag datastructures for better cacheline
locality, and move more operations outside of locking. From Jesper
Dangaard Brouer.
12) Instead of strict master <--> slave relationships, allow arbitrary
scenerios with "upper device lists". From Jiri Pirko.
13) Improve rate limiting accuracy in TBF and act_police, also from Jiri
Pirko.
14) Add a BPF filter netfilter match target, from Willem de Bruijn.
15) Orphan and delete a bunch of pre-historic networking drivers from
Paul Gortmaker.
16) Add TSO support for GRE tunnels, from Pravin B SHelar. Although
this still needs some minor bug fixing before it's %100 correct in
all cases.
17) Handle unresolved IPSEC states like ARP, with a resolution packet
queue. From Steffen Klassert.
18) Remove TCP Appropriate Byte Count support (ABC), from Stephen
Hemminger. This was long overdue.
19) Support SO_REUSEPORT, from Tom Herbert.
20) Allow locking a socket BPF filter, so that it cannot change after a
process drops capabilities.
21) Add VLAN filtering to bridge, from Vlad Yasevich.
22) Bring ipv6 on-par with ipv4 and do not cache neighbour entries in
the ipv6 routes, from YOSHIFUJI Hideaki.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1538 commits)
ipv6: fix race condition regarding dst->expires and dst->from.
net: fix a wrong assignment in skb_split()
ip_gre: remove an extra dst_release()
ppp: set qdisc_tx_busylock to avoid LOCKDEP splat
atl1c: restore buffer state
net: fix a build failure when !CONFIG_PROC_FS
net: ipv4: fix waring -Wunused-variable
net: proc: fix build failed when procfs is not configured
Revert "xen: netback: remove redundant xenvif_put"
net: move procfs code to net/core/net-procfs.c
qmi_wwan, cdc-ether: add ADU960S
bonding: set sysfs device_type to 'bond'
bonding: fix bond_release_all inconsistencies
b44: use netdev_alloc_skb_ip_align()
xen: netback: remove redundant xenvif_put
net: fec: Do a sanity check on the gpio number
ip_gre: propogate target device GSO capability to the tunnel device
ip_gre: allow CSUM capable devices to handle packets
bonding: Fix initialize after use for 3ad machine state spinlock
bonding: Fix race condition between bond_enslave() and bond_3ad_update_lacp_rate()
...
- Rework of the ACPI namespace scanning code from Rafael J. Wysocki
with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg,
Toshi Kani, and Yinghai Lu.
- ACPI power resources handling and ACPI device PM update from
Rafael J. Wysocki.
- ACPICA update to version 20130117 from Bob Moore and Lv Zheng
with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and
Tim Gardner.
- Support for Intel Lynxpoint LPSS from Mika Westerberg.
- cpuidle update from Len Brown including Intel Haswell support, C1
state for intel_idle, removal of global pm_idle.
- cpuidle fixes and cleanups from Daniel Lezcano.
- cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri
with contributions from Stratos Karafotis and Rickard Andersson.
- Intel P-states driver for Sandy Bridge processors from
Dirk Brandewie.
- cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn.
- cpufreq fixes related to ordering issues between acpi-cpufreq and
powernow-k8 from Borislav Petkov and Matthew Garrett.
- cpufreq support for Calxeda Highbank processors from Mark Langsdorf
and Rob Herring.
- cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update
from Shawn Guo.
- cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat,
and Inderpal Singh.
- Support for "lightweight suspend" from Zhang Rui.
- Removal of the deprecated power trace API from Paul Gortmaker.
- Assorted updates from Andreas Fleig, Colin Ian King,
Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei,
Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo,
Thomas Renninger, and Yasuaki Ishimatsu.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJRIsArAAoJEKhOf7ml8uNsD6MP/j7C4NA+GTq6RdwoJt+Yki0K
9Ep8I4pEuRFoN/oskv24EyQhpGJIk6UxWcJ/DWFBc+1VhmKORta7k2Idv/wlJA77
s7AcDveA9xcDh+TVfbh87TeuiMSXiSdDZbiaQO+wMizWJAF3F84AnjiAqqqyQcSK
bA5/Siz/vWlt9PyYDaQtHTVE4lpvPuVcQdYewsdaH2PsmUjvIg/TUzg28CTrdyvv
eHOdBK9R0/OLQLhzRbL0VOGJ//wEl+HJRO0QEhTKPgdQ1e/VH/4Zu5WSzF8P/x4C
s2f8U4IKQqulDuDHXtpMpelFm7hRWgsOqZLkcyXLs+0dvSM9CTPO6P0ZaImxUctk
5daHWEsXUnCErDQawt1mcZP8l6qnxofMQIfLXyPVzvlSnHyToTmrtXa1v2u4AuL/
hOo4MYWsFNUmRdtGFFGlExGgEDZ4G5NwiYjRBl/6XJ3v4nhnnMbuzxP8scpoe5m1
8tjroJHZFUUs/mFU/H+oRbHzSzXPmp1sddNaTg4OpVmTn3DDh6ljnFhiItd1Ndw0
5ldVbSe6ETq5RoK0TbzvQOeVpa9F3JfqbrXLQPqfd2iz/No41LQYG1uShRYuXKuA
wfEcc+c9VMd3FILu05pGwBnU8VS9VbxTYMz7xDxg6b29Ywnb7u+Q1ycCk2gFYtkS
E2oZDuyewTJxaskzYsNr
=wijn
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
- Rework of the ACPI namespace scanning code from Rafael J. Wysocki
with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg,
Toshi Kani, and Yinghai Lu.
- ACPI power resources handling and ACPI device PM update from Rafael
J Wysocki.
- ACPICA update to version 20130117 from Bob Moore and Lv Zheng with
contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner.
- Support for Intel Lynxpoint LPSS from Mika Westerberg.
- cpuidle update from Len Brown including Intel Haswell support, C1
state for intel_idle, removal of global pm_idle.
- cpuidle fixes and cleanups from Daniel Lezcano.
- cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with
contributions from Stratos Karafotis and Rickard Andersson.
- Intel P-states driver for Sandy Bridge processors from Dirk
Brandewie.
- cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn.
- cpufreq fixes related to ordering issues between acpi-cpufreq and
powernow-k8 from Borislav Petkov and Matthew Garrett.
- cpufreq support for Calxeda Highbank processors from Mark Langsdorf
and Rob Herring.
- cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update
from Shawn Guo.
- cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat,
and Inderpal Singh.
- Support for "lightweight suspend" from Zhang Rui.
- Removal of the deprecated power trace API from Paul Gortmaker.
- Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso,
Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu,
Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki
Ishimatsu.
* tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits)
PM idle: remove global declaration of pm_idle
unicore32 idle: delete stray pm_idle comment
openrisc idle: delete pm_idle
mn10300 idle: delete pm_idle
microblaze idle: delete pm_idle
m32r idle: delete pm_idle, and other dead idle code
ia64 idle: delete pm_idle
cris idle: delete idle and pm_idle
ARM64 idle: delete pm_idle
ARM idle: delete pm_idle
blackfin idle: delete pm_idle
sparc idle: rename pm_idle to sparc_idle
sh idle: rename global pm_idle to static sh_idle
x86 idle: rename global pm_idle to static x86_idle
APM idle: register apm_cpu_idle via cpuidle
cpufreq / intel_pstate: Add kernel command line option disable intel_pstate.
cpufreq / intel_pstate: Change to disallow module build
tools/power turbostat: display SMI count by default
intel_idle: export both C1 and C1E
ACPI / hotplug: Fix concurrency issues and memory leaks
...
Pull cpuset changes from Tejun Heo:
- Synchornization has seen a lot of changes with focus on decoupling
cpuset synchronization from cgroup internal locking.
After this change, there only remain a couple of mostly trivial
dependencies on cgroup_lock outside cgroup core proper. cgroup_lock
is scheduled to be unexported in this devel cycle.
This will finally remove the fragile locking order around cgroup
(cgroup locking wants to / should be one of the outermost but yet has
been acquired from deep inside individual controllers).
- At this point, Li is most knowlegeable with cpuset and taking over
the maintainership of cpuset.
* 'for-3.9-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: drop spurious retval assignment in proc_cpuset_show()
cpuset: fix RCU lockdep splat
cpuset: update MAINTAINERS
cpuset: remove cpuset->parent
cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()
cpuset: replace cgroup_mutex locking with cpuset internal locking
cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty
cpuset: pin down cpus and mems while a task is being attached
cpuset: make CPU / memory hotplug propagation asynchronous
cpuset: drop async_rebuild_sched_domains()
cpuset: don't nest cgroup_mutex inside get_online_cpus()
cpuset: reorganize CPU / memory hotplug handling
cpuset: cleanup cpuset[_can]_attach()
cpuset: introduce cpuset_for_each_child()
cpuset: introduce CS_ONLINE
cpuset: introduce ->css_on/offline()
cpuset: remove fast exit path from remove_tasks_in_empty_cpuset()
cpuset: remove unused cpuset_unlock()
Pull cgroup changes from Tejun Heo:
"Nothing too drastic.
- Removal of synchronize_rcu() from userland visible paths.
- Various fixes and cleanups from Li.
- cgroup_rightmost_descendant() added which will be used by cpuset
changes (it will be a separate pull request)."
* 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fail if monitored file and event_control are in different cgroup
cgroup: fix cgroup_rmdir() vs close(eventfd) race
cpuset: fix cpuset_print_task_mems_allowed() vs rename() race
cgroup: fix exit() vs rmdir() race
cgroup: remove bogus comments in cgroup_diput()
cgroup: remove synchronize_rcu() from cgroup_diput()
cgroup: remove duplicate RCU free on struct cgroup
sched: remove redundant NULL cgroup check in task_group_path()
sched: split out css_online/css_offline from tg creation/destruction
cgroup: initialize cgrp->dentry before css_alloc()
cgroup: remove a NULL check in cgroup_exit()
cgroup: fix bogus kernel warnings when cgroup_create() failed
cgroup: remove synchronize_rcu() from rebind_subsystems()
cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}()
cgroup: use new hashtable implementation
cgroups: fix cgroup_event_listener error handling
cgroups: move cgroup_event_listener.c to tools/cgroup
cgroup: implement cgroup_rightmost_descendant()
cgroup: remove unused dummy cgroup_fork_callbacks()
Pull async changes from Tejun Heo:
"These are followups for the earlier deadlock issue involving async
ending up waiting for itself through block requesting module[1]. The
following changes are made by these commits.
- Instead of requesting default elevator on each request_queue init,
block now requests it once early during boot.
- Kmod triggers warning if invoked from an async worker.
- Async synchronization implementation has been reimplemented. It's
a lot simpler now."
* 'for-3.9-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
async: initialise list heads to fix crash
async: replace list of active domains with global list of pending items
async: keep pending tasks on async_domain and remove async_pending
async: use ULLONG_MAX for infinity cookie value
async: bring sanity to the use of words domain and running
async, kmod: warn on synchronous request_module() from async workers
block: don't request module during elevator init
init, block: try to load default elevator module early during boot
Pull workqueue changes from Tejun Heo:
"A lot of reorganization is going on mostly to prepare for worker pools
with custom attributes so that workqueue can replace custom pool
implementations in places including writeback and btrfs and make CPU
assignment in crypto more flexible.
workqueue evolved from purely per-cpu design and implementation, so
there are a lot of assumptions regarding being bound to CPUs and even
unbound workqueues are implemented as an extension of the model -
workqueues running on the special unbound CPU. Bulk of changes this
round are about promoting worker_pools as the top level abstraction
replacing global_cwq (global cpu workqueue). At this point, I'm
fairly confident about getting custom worker pools working pretty soon
and ready for the next merge window.
Lai's patches are replacing the convoluted mb() dancing workqueue has
been doing with much simpler mechanism which only depends on
assignment atomicity of long. For details, please read the commit
message of 0b3dae68ac ("workqueue: simplify is-work-item-queued-here
test"). While the change ends up adding one pointer to struct
delayed_work, the inflation in percentage is less than five percent
and it decouples delayed_work logic a lot more cleaner from usual work
handling, removes the unusual memory barrier dancing, and allows for
further simplification, so I think the trade-off is acceptable.
There will be two more workqueue related pull requests and there are
some shared commits among them. I'll write further pull requests
assuming this pull request is pulled first."
* 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (37 commits)
workqueue: un-GPL function delayed_work_timer_fn()
workqueue: rename cpu_workqueue to pool_workqueue
workqueue: reimplement is_chained_work() using current_wq_worker()
workqueue: fix is_chained_work() regression
workqueue: pick cwq instead of pool in __queue_work()
workqueue: make get_work_pool_id() cheaper
workqueue: move nr_running into worker_pool
workqueue: cosmetic update in try_to_grab_pending()
workqueue: simplify is-work-item-queued-here test
workqueue: make work->data point to pool after try_to_grab_pending()
workqueue: add delayed_work->wq to simplify reentrancy handling
workqueue: make work_busy() test WORK_STRUCT_PENDING first
workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END
workqueue: post global_cwq removal cleanups
workqueue: rename nr_running variables
workqueue: remove global_cwq
workqueue: remove worker_pool->gcwq
workqueue: replace for_each_worker_pool() with for_each_std_worker_pool()
workqueue: make freezing/thawing per-pool
workqueue: make hotplug processing per-pool
...
Pull workqueue [delayed_]work_pending() cleanups from Tejun Heo:
"This is part of on-going cleanups to remove / minimize usages of
workqueue interfaces which are deprecated and/or misleading.
This round drops a number of usages of [delayed_]work_pending(), which
are dangerous as they lack any form of synchronization and thus often
lead to buggy / unnecessary code. There are a couple legitimate use
cases in kernel. Hopefully, they can be converted and
[delayed_]work_pending() can be removed completely. Even if not,
removing most of misuses should make it more difficult to find
examples of misuses and thus slow down growth of them.
These changes are independent from other workqueue changes."
* 'for-3.9-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
wimax/i2400m: fix i2400m->wake_tx_skb handling
kprobes: fix wait_for_kprobe_optimizer()
ipw2x00: simplify scan_event handling
video/exynos: don't use [delayed_]work_pending()
tty/max3100: don't use [delayed_]work_pending()
x86/mce: don't use [delayed_]work_pending()
rfkill: don't use [delayed_]work_pending()
wl1251: don't use [delayed_]work_pending()
thinkpad_acpi: don't use [delayed_]work_pending()
mwifiex: don't use [delayed_]work_pending()
sja1000: don't use [delayed_]work_pending()
Pull two x86 kernel build changes from Ingo Molnar:
"The first change modifies how 'make oldconfig' works on cross-bitness
situations on x86. It was felt the new behavior of preserving the
bitness of the .config is more logical. This is a leftover of the
merge.
The second change eliminates a Perl warning. (There's another, more
complete fix resulting of this warning fix, which second fix in flight
to you via the kbuild tree, which will remove the timeconst.pl script
altogether.)"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timeconst.pl: Eliminate Perl warning
x86: Default to ARCH=x86 to avoid overriding CONFIG_64BIT
Pull x86/apic changes from Ingo Molnar:
"Main changes:
- Multiple MSI support added to the APIC, PCI and AHCI code - acked
by all relevant maintainers, by Alexander Gordeev.
The advantage is that multiple AHCI ports can have multiple MSI
irqs assigned, and can thus spread to multiple CPUs.
[ Drivers can make use of this new facility via the
pci_enable_msi_block_auto() method ]
- x86 IOAPIC code from interrupt remapping cleanups from Joerg
Roedel:
These patches move all interrupt remapping specific checks out of
the x86 core code and replaces the respective call-sites with
function pointers. As a result the interrupt remapping code is
better abstraced from x86 core interrupt handling code.
- Various smaller improvements, fixes and cleanups."
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess
x86, kvm: Fix intialization warnings in kvm.c
x86, irq: Move irq_remapped out of x86 core code
x86, io_apic: Introduce eoi_ioapic_pin call-back
x86, msi: Introduce x86_msi.compose_msi_msg call-back
x86, irq: Introduce setup_remapped_irq()
x86, irq: Move irq_remapped() check into free_remapped_irq
x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq()
x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 core
x86, irq: Add data structure to keep AMD specific irq remapping information
x86, irq: Move irq_remapping_enabled declaration to iommu code
x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pin
x86, io_apic: Move irq_remapping_enabled checks out of check_timer()
x86, io_apic: Convert setup_ioapic_entry to function pointer
x86, io_apic: Introduce set_affinity function pointer
x86, msi: Use IRQ remapping specific setup_msi_irqs routine
x86, hpet: Introduce x86_msi_ops.setup_hpet_msi
x86, io_apic: Introduce x86_io_apic_ops.print_entries for debugging
x86, io_apic: Introduce x86_io_apic_ops.disable()
x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resume
...
Pull timer changes from Ingo Molnar:
"Main changes:
- ntp: Add CONFIG_RTC_SYSTOHC: a generic RTC driver facility
complementing the existing CONFIG_RTC_HCTOSYS, which uses NTP to
keep the hardware clock updated.
- posix-timers: Fix clock_adjtime to always return timex data on
success. This is changing the ABI, but no breakage was expected
and found - caution is warranted nevertheless.
- platform persistent clock improvements/cleanups.
- clockevents: refactor timer broadcast handling to be more generic
and less duplicated with matching architecture code (mostly ARM
motivated.)
- various fixes and cleanups"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet()
posix-cpu-timers: Fix nanosleep task_struct leak
clockevents: Fix generic broadcast for FEAT_C3STOP
time, Fix setting of hardware clock in NTP code
hrtimer: Prevent hrtimer_enqueue_reprogram race
clockevents: Add generic timer broadcast function
clockevents: Add generic timer broadcast receiver
timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK
x86/time/rtc: Don't print extended CMOS year when reading RTC
x86: Select HAS_PERSISTENT_CLOCK on x86
timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option
rtc: Skip the suspend/resume handling if persistent clock exist
timekeeping: Add persistent_clock_exist flag
posix-timers: Fix clock_adjtime to always return timex data on success
Round the calculated scale factor in set_cyc2ns_scale()
NTP: Add a CONFIG_RTC_SYSTOHC configuration
MAINTAINERS: Update John Stultz's email
time: create __getnstimeofday for WARNless calls
Pull preparatory smp/hotplug patches from Ingo Molnar:
"Some early preparatory changes for the WIP hotplug rework by Thomas
Gleixner."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine: Use smpboot threads
stop_machine: Store task reference in a separate per cpu variable
smpboot: Allow selfparking per cpu threads
Pull scheduler changes from Ingo Molnar:
"Main changes:
- scheduler side full-dynticks (user-space execution is undisturbed
and receives no timer IRQs) preparation changes that convert the
cputime accounting code to be full-dynticks ready, from Frederic
Weisbecker.
- Initial sched.h split-up changes, by Clark Williams
- select_idle_sibling() performance improvement by Mike Galbraith:
" 1 tbench pair (worst case) in a 10 core + SMT package:
pre 15.22 MB/sec 1 procs
post 252.01 MB/sec 1 procs "
- sched_rr_get_interval() ABI fix/change. We think this detail is not
used by apps (so it's not an ABI in practice), but lets keep it
under observation.
- misc RT scheduling cleanups, optimizations"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
sched/rt: Add <linux/sched/rt.h> header to <linux/init_task.h>
cputime: Remove irqsave from seqlock readers
sched, powerpc: Fix sched.h split-up build failure
cputime: Restore CPU_ACCOUNTING config defaults for PPC64
sched/rt: Move rt specific bits into new header file
sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice
sched: Move sched.h sysctl bits into separate header
sched: Fix signedness bug in yield_to()
sched: Fix select_idle_sibling() bouncing cow syndrome
sched/rt: Further simplify pick_rt_task()
sched/rt: Do not account zero delta_exec in update_curr_rt()
cputime: Safely read cputime of full dynticks CPUs
kvm: Prepare to add generic guest entry/exit callbacks
cputime: Use accessors to read task cputime stats
cputime: Allow dynamic switch between tick/virtual based cputime accounting
cputime: Generic on-demand virtual cputime accounting
cputime: Move default nsecs_to_cputime() to jiffies based cputime file
cputime: Librarize per nsecs resolution cputime definitions
cputime: Avoid multiplication overflow on utime scaling
context_tracking: Export context state for generic vtime
...
Fix up conflict in kernel/context_tracking.c due to comment additions.