linux

Author	SHA1	Message	Date
Linus Torvalds	a8006bd915	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Fix four timer locking races: two were noticed by Linus while reviewing the code while chasing for a corruption bug, and two from fixing spurious USB timeouts" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Prevent base clock corruption when forwarding timers: Prevent base clock rewind when forwarding clock timers: Lock base for same bucket optimization timers: Plug locking race vs. timer migration	2016-10-28 11:26:01 -07:00
David S. Miller	c2e169be8c	Merge branch 'mlxsw-fixes' Jiri Pirko says: ==================== mlxsw: Couple of fixes Couple of LPM tree management fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-10-28 13:43:56 -04:00
Jiri Pirko	8b99becdc8	mlxsw: spectrum_router: Compare only trees which are in use during tree get Only trees which are in use should be compared to requested prefix usage. Fixes: `53342023ee` ("mlxsw: spectrum_router: Implement LPM trees management") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-10-28 13:43:56 -04:00
Jiri Pirko	2083d36790	mlxsw: spectrum_router: Save requested prefix bitlist when creating tree Currently, the prefix bitlist is not saved for LPM trees, causing the compare to always fail which causes the tree to be destroyed and created for every inserted and removed FIB entry. So fix this by saving the bitlist as it should have been done from the very beginning. Fixes: `53342023ee` ("mlxsw: spectrum_router: Implement LPM trees management") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-10-28 13:43:56 -04:00
Linus Torvalds	965c4b7e1a	Merge branches 'core-urgent-for-linus', 'irq-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool, irq and scheduler fixes from Ingo Molnar: "One more objtool fixlet for GCC6 code generation patterns, an irq DocBook fix and an unused variable warning fix in the scheduler" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix rare switch jump table pattern detection * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: doc: Add missing parameter for msi_setup * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Remove unused but set variable 'rq'	2016-10-28 10:12:27 -07:00
Vineet Gupta	b75dcd9c7d	ARC: module: print pretty section names Now that we have referece to section name string table in apply_relocate_add(), use it to - print the name of section being relocated - print symbol with NULL name (since it refers to a section) before \| Section to fixup 7000a060 \| ========================================================= \| rela->r_off \| rela->addend \| sym->st_value \| ADDR \| VALUE \| ========================================================= \| 1c 0 7000e000 7000a07c 7000e000 [] \| 40 0 7000a000 7000a0a0 7000a000 [] after \| Section to fixup .eh_frame @7000a060 \| ========================================================= \| r_off r_add st_value ADDRESS VALUE \| ========================================================= \| 1c 0 7000e000 7000a07c 7000e000 [.init.text] \| 40 0 7000a000 7000a0a0 7000a000 [.exit.text] Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-10-28 10:10:29 -07:00
Vineet Gupta	d65283f7b6	ARC: module: elide loop to save reference to .eh_frame The loop was really needed in .debug_frame regime where wanted make it as SH_ALLOC so that apply_relocate_add() would process it. That's not needed for .eh_frame, so we check this in apply_relocate_add() which gets called for each section. Note that we need to save reference to "section name strings" section in module_frob_arch_sections() since apply_relocate_add() doesn't get that Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-10-28 10:10:28 -07:00
Vineet Gupta	f644e36888	ARC: mm: retire ARC_DBG_TLB_MISS_COUNT... ... given that we have perf counters abel to do the same thing non intrusively Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-10-28 10:10:28 -07:00
Vineet Gupta	c300547588	ARC: build: retire old toggles These are really ancient toggles and tools no longer require them to be passed. This paves way for deprecating them in long run. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-10-28 10:10:27 -07:00
Vineet Gupta	d975cbc8ac	ARC: boot log: refactor cpu name/release printing The motivation is to identify ARC750 vs. ARC770 (we currently print generic "ARC700"). A given ARC700 release could be 750 or 770, with same ARCNUM (or family identifier which is unfortunate). The existing arc_cpu_tbl[] kept a single concatenated string for core name and release which thus doesn't work for 750 vs. 770 identification. So split this into 2 tables, one with core names and other with release. And while we are at it, get rid of the range checking for family numbers. We just document the known to exist cores running Linux and ditch others. With this in place, we add detection of ARC750 which is - cores 0x33 and before - cores 0x34 and later with MMUv2 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-10-28 10:09:07 -07:00
Vineet Gupta	d7c46114e3	ARC: boot log: remove awkward space comma from MMU line Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-10-28 10:09:06 -07:00
Vineet Gupta	a024fd9bc4	ARC: boot log: don't assume SWAPE instruction support This came to light when helping a customer with oldish ARC750 core who were getting instruction errors because of lack of SWAPE but boot log was incorrectly printing it as being present Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-10-28 10:09:06 -07:00
Vineet Gupta	73e284d257	ARC: boot log: refactor printing abt features not captured in BCRs On older arc700 cores, some of the features configured were not present in Build config registers. To print about them at boot, we just use the Kconfig option i.e. whether linux is built to use them or not. So yes this seems bogus, but what else can be done. Moreover if linux is booting with these enabled, then the Kconfig info is a good indicator anyways. Over time these "hacks" accumulated in read_arc_build_cfg_regs() as well as arc_cpu_mumbojumbo(). so refactor and move all of those in a single place: read_arc_build_cfg_regs(). This causes some code redcution too: \| bloat-o-meter2 arch/arc/kernel/setup.o.0 arch/arc/kernel/setup.o.1 \| add/remove: 0/0 grow/shrink: 2/1 up/down: 64/-132 (-68) \| function old new delta \| setup_processor 610 670 +60 \| cpuinfo_arc700 76 80 +4 \| arc_cpu_mumbojumbo 752 620 -132 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-10-28 10:07:43 -07:00
Linus Torvalds	f6167514c8	Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "My patch fixes the btrfs list_head abuse that we tracked down during Dave Jones' memory corruption investigation. With both Jens and my patches in place, I'm no longer able to trigger problems. Filipe is fixing a difficult old bug between snapshots, balance and send. Dave is cooking a few more for the next rc, but these are tested and ready" * 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix races on root_log_ctx lists btrfs: fix incremental send failure caused by balance	2016-10-28 10:07:35 -07:00
Vineet Gupta	711c1f2671	ARCv2: boot log: print IOC exists as well as enabled status Previously we would not print the case when IOC existed but was not enabled. And while at it, reduce one line off boot printing by consolidating the Peripheral address space and IO-Coherency which in a way applies to them Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-10-28 10:06:48 -07:00
Linus Torvalds	2cd0b50a18	sound fixes for 4.9-rc3 Here contains the usual stuff -- the fixups and quirks for HD-audio and USB-audio, in addition to a bad regression fix in ALSA sequencer timer since 4.8, and a trivial fix for asihpi PCI driver. -----BEGIN PGP SIGNATURE----- iQIrBAABCAAVBQJYEx+3Dhx0aXdhaUBzdXNlLmRlAAoJEGwxgFQ9KSmkLz0QAKl5 lPbtUc3qW8KXqLKifzFX9hds8SH6YJ36SBpUsqteb3PNpm63ZQJmvevDpt7ey3jQ M9i/peAl7qoIiyXLVdSEHgV+erIMKSt6eYlkUf6wZZWw6GxYXE40DNSEgAx7sHDs dVs/q4hC4/xRRmbqdG1fxM6sBAQUiKx62P0QwVQXc2SiGhUOqwILF93UwdL+9ZPm 8R8bOnjVMbLLZolJRK3ll/qkFgmJFLhXYE5NF1d0vcoA7f3qbtcp0xOLZ1QwmTFZ DJHkaO2K6Jg9O8NkmeNIfmdiHXpte69roxDqBQKoWiV4CRPY7Y1OvrGvi0Z37X0Z EickpItM1hCGd7W0WDMuwVrkUIS4osoisQ/ug8hxl4MPECHlrnCjctoRNBLf1+JF l3+j+XC+A8lsys8O09VhcZxK5Ns6ErBLR/e8X8ANQVGatt1XXo0usygdxnP9E8yD ZuuLa2JW2SuucRsuXe/y/LFSBPYnKqAQKhmd5UyqC8ndjfX/0DbWBnkGf5qu99ww sz/zhAmHkSswwbuYdJ/b474NmeHmu+wGR7euKlks4JO4UHKJZZ+5BXNjZbgzDneS /s6WnoPHOh9my8NgLDLPMimsTStCI6Z89khcgtgij7xAvIEiYZJ0vGZ2FeHYbT79 n9GWJikYtTxvczVEQI2X8MuJyi1PjDacmUdQ6hc/ =n80w -----END PGP SIGNATURE----- Merge tag 'sound-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This contains the usual stuff -- the fixups and quirks for HD-audio and USB-audio, in addition to a bad regression fix in ALSA sequencer timer since 4.8, and a trivial fix for asihpi PCI driver" * tag 'sound-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Add quirk for Syntek STK1160 ALSA: seq: Fix time account regression ALSA: hda - Fix surround output pins for ASRock B150M mobo ALSA: hda - Fix headset mic detection problem for two Dell laptops ALSA: asihpi: fix kernel memory disclosure ALSA: hda - Adding a new group of pin cfg into ALC295 pin quirk table ALSA: hda - allow 40 bit DMA mask for NVidia devices	2016-10-28 10:00:44 -07:00
Linus Torvalds	bdb520845b	patches to fix a regression in 4.9-rc1 on x86 PAT -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYEFHpAAoJEAx081l5xIa+tyYP/0xq+ZqHwS90k1mge/2uWYB3 sVQvFFIV55r6siOjIdDek+dsHq7IGFOChbsxegGyGvfwYVjzSmdoBwO1aMTV+Ii9 OoqLS/53kts9jHOVm1UNsbxW1lzJVWoFWpMY57KDodWsWxVbd0NuP9mfTRIH2Sfj MmymKigXgwHSndn07+2xp9jI9Y5krtOLl+4YDsly7JF2IR7UBRRoW8n/WHR75lny MNn2Vtn9NBwxDieFQc/KQGUQ1nC8wB0c3wtGDDQIux0gp6IVW+pQoCLo6CMtgHXB IXGDojVA9KpcyEUz5RkBsVHYmvZR1PoS+nrnEE6b/C8p7UDuyCrk1Zfy0ZTGV/hq LKmfRKB3NWbgKnBbqOdFYhsh/iyVjqoNdGYqfR4qJx5JGIltVWbjYwlwUpImlrIY gKqtAdVFaFuoJs8MpFharxFlBf/o9DPDTPTWPQxGI16y7poH+86v7QmAJT9dJHRE pf3oyYI3eHTeIQb42f7PHSp4hsVJMX5Awkm9a8b9PhNlu/3cHUOYkCT060ripMBc ZksIUqKFzuk+TDRTnQrCQjaC4vJ6s8XUwntFhfHCZUmnnH8YhYpryDwdyzavcUvX or8rkKsO/+Jxa1kRr8d2c1JYi2FIMHBP0srAu43WeYyAsSPFIL/9l5flIeHi2Ow3 tSHbCo4W5YRbQaVcBzxG =prah -----END PGP SIGNATURE----- Merge tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux Pull drm x86/pat regression fixes from Dave Airlie: "This is a standalone pull request for the fix for a regression introduced in -rc1 by a change to vm_insert_mixed to start using the PAT range tracking to validate page protections. With this fix in place, all the VRAM mappings for GPU drivers ended up at UC instead of WC. There are probably better ways to fix this long term, but nothing I'd considered for -fixes that wouldn't need more settling in time. So I've just created a new arch API that the drivers can reserve all their VRAM aperture ranges as WC" * tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux: drm/drivers: add support for using the arch wc mapping API. x86/io: add interface to reserve io memtype for a resource range. (v1.1)	2016-10-28 09:36:07 -07:00
Linus Torvalds	e0f3e6a7cc	- A couple DM raid and DM mirror fixes - A couple .request_fn request-based DM NULL pointer fixes - A fix for a DM target reference count leak, on target load error, that prevented associated DM target kernel module(s) from being removed -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJYEo+lAAoJEMUj8QotnQNaGfkH/jGqr4bj4l2Ty3QgV95fYW7+ lqp4Flkevm35HotEGKuuizvqbbVrj57BCGLE+dV48/X2cv5QbUFht6QBu9iJTrk6 Q7VqyBOvDDnOZHIof5CfKBeLZ2gd8YHZwUpYvzJcThSWS1+LjeVqg8a33LMZroMQ rghVxFCIKy6LqCryIiTHk1t+OfmuBz3S2LXcQXFY7XAPpWq/f+V66gthTZUpm86+ Gu1xOHQlvnmf5xnDUxCpPVbQNY334D/aSbU73i2cdvfL1pkxBFNcI+LbPcu+sNP9 ugGjPj4etbIRsVysuW3fLhn2kKqaXXVuD1rLTQ+C3ytciI+RQJvG892gWhAABRQ= =apHk -----END PGP SIGNATURE----- Merge tag 'dm-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - a couple DM raid and DM mirror fixes - a couple .request_fn request-based DM NULL pointer fixes - a fix for a DM target reference count leak, on target load error, that prevented associated DM target kernel module(s) from being removed * tag 'dm-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm table: fix missing dm_put_target_type() in dm_table_add_target() dm rq: clear kworker_task if kthread_run() returned an error dm: free io_barrier after blk_cleanup_queue call dm raid: fix activation of existing raid4/10 devices dm mirror: use all available legs on multiple failures dm mirror: fix read error on recovery after default leg failure dm raid: fix compat_features validation	2016-10-28 09:27:58 -07:00
Linus Torvalds	43937003de	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull key fixes from James Morris: - fix a buffer overflow when displaying /proc/keys [CVE-2016-7042]. - fix broken initialisation in the big_key implementation that can result in an oops. - make big_key depend on having a random number generator available in Kconfig. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security/keys: make BIG_KEYS dependent on stdrng. KEYS: Sort out big_key initialisation KEYS: Fix short sprintf buffer in /proc/keys show function	2016-10-28 09:23:59 -07:00
Richard Weinberger	a00052a296	ubifs: Fix regression in ubifs_readdir() Commit `c83ed4c9db` ("ubifs: Abort readdir upon error") broke overlayfs support because the fix exposed an internal error code to VFS. Reported-by: Peter Rosin <peda@axentia.se> Tested-by: Peter Rosin <peda@axentia.se> Reported-by: Ralph Sennhauser <ralph.sennhauser@gmail.com> Tested-by: Ralph Sennhauser <ralph.sennhauser@gmail.com> Fixes: `c83ed4c9db` ("ubifs: Abort readdir upon error") Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>	2016-10-28 14:48:31 +02:00
Boris Brezillon	40b6e61ac7	ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap() Commit `e96a8a3bb6` ("UBI: Fastmap: Do not add vol if it already exists") introduced a bug by changing the possible error codes returned by add_vol(): - this function no longer returns NULL in case of allocation failure but return ERR_PTR(-ENOMEM) - when a duplicate entry in the volume RB tree is found it returns ERR_PTR(-EEXIST) instead of ERR_PTR(-EINVAL) Fix the tests done on add_vol() return val to match this new behavior. Fixes: `e96a8a3bb6` ("UBI: Fastmap: Do not add vol if it already exists") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2016-10-28 14:48:18 +02:00
Gabriel Krisman Bertazi	a7d5afe82d	MAINTAINERS: Add entry for genwqe driver Frank and I maintain this Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: haver@linux.vnet.ibm.com Acked-by: Frank Haverkamp <haver@linux.vnet.ibm.com>= Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-10-28 08:28:24 -04:00
Jorgen Hansen	eb94cd68ab	VMCI: Doorbell create and destroy fixes This change consists of two changes: 1) If vmci_doorbell_create is called when neither guest nor host personality as been initialized, vmci_get_context_id will return VMCI_INVALID_ID. In that case, we should fail the create call. 2) In doorbell destroy, we assume that vmci_guest_code_active() has the same return value on create and destroy. That may not be the case, so we may end up with the wrong refcount. Instead, destroy should check explicitly whether the doorbell is in the index table as an indicator of whether the guest code was active at create time. Reviewed-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-10-28 08:26:21 -04:00
Gerald Schaefer	a7a7aeefbc	GenWQE: Fix bad page access during abort of resource allocation When interrupting an application which was allocating DMAable memory, it was possible, that the DMA memory was deallocated twice, leading to the error symptoms below. Thanks to Gerald, who analyzed the problem and provided this patch. I agree with his analysis of the problem: ddcb_cmd_fixups() -> genwqe_alloc_sync_sgl() (fails in f/lpage, but sgl->sgl != NULL and f/lpage maybe also != NULL) -> ddcb_cmd_cleanup() -> genwqe_free_sync_sgl() (double free, because sgl->sgl != NULL and f/lpage maybe also != NULL) In this scenario we would have exactly the kind of double free that would explain the WARNING / Bad page state, and as expected it is caused by broken error handling (cleanup). Using the Ubuntu git source, tag Ubuntu-4.4.0-33.52, he was able to reproduce the "Bad page state" issue, and with the patch on top he could not reproduce it any more. ------------[ cut here ]------------ WARNING: at /build/linux-o03cxz/linux-4.4.0/arch/s390/include/asm/pci_dma.h:141 Modules linked in: qeth_l2 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common genwqe_card qeth crc_itu_t qdio ccwgroup vmur dm_multipath dasd_eckd_mod dasd_mod CPU: 2 PID: 3293 Comm: genwqe_gunzip Not tainted 4.4.0-33-generic #52-Ubuntu task: 0000000032c7e270 ti: 00000000324e4000 task.ti: 00000000324e4000 Krnl PSW : 0404c00180000000 0000000000156346 (dma_update_cpu_trans+0x9e/0xa8) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 00000000324e7bcd 0000000000c3c34a 0000000027628298 000000003215b400 0000000000000400 0000000000001fff 0000000000000400 0000000116853000 07000000324e7b1e 0000000000000001 0000000000000001 0000000000000001 0000000000001000 0000000116854000 0000000000156402 00000000324e7a38 Krnl Code: 000000000015633a: 95001000 cli 0(%r1),0 000000000015633e: a774ffc3 brc 7,1562c4 #0000000000156342: a7f40001 brc 15,156344 >0000000000156346: 92011000 mvi 0(%r1),1 000000000015634a: a7f4ffbd brc 15,1562c4 000000000015634e: 0707 bcr 0,%r7 0000000000156350: c00400000000 brcl 0,156350 0000000000156356: eb7ff0500024 stmg %r7,%r15,80(%r15) Call Trace: ([<00000000001563e0>] dma_update_trans+0x90/0x228) [<00000000001565dc>] s390_dma_unmap_pages+0x64/0x160 [<00000000001567c2>] s390_dma_free+0x62/0x98 [<000003ff801310ce>] __genwqe_free_consistent+0x56/0x70 [genwqe_card] [<000003ff801316d0>] genwqe_free_sync_sgl+0xf8/0x160 [genwqe_card] [<000003ff8012bd6e>] ddcb_cmd_cleanup+0x86/0xa8 [genwqe_card] [<000003ff8012c1c0>] do_execute_ddcb+0x110/0x348 [genwqe_card] [<000003ff8012c914>] genwqe_ioctl+0x51c/0xc20 [genwqe_card] [<000000000032513a>] do_vfs_ioctl+0x3b2/0x518 [<0000000000325344>] SyS_ioctl+0xa4/0xb8 [<00000000007b86c6>] system_call+0xd6/0x264 [<000003ff9e8e520a>] 0x3ff9e8e520a Last Breaking-Event-Address: [<0000000000156342>] dma_update_cpu_trans+0x9a/0xa8 ---[ end trace 35996336235145c8 ]--- BUG: Bad page state in process jbd2/dasdb1-8 pfn:3215b page:000003d100c856c0 count:-1 mapcount:0 mapping: (null) index:0x0 flags: 0x3fffc0000000000() page dumped because: nonzero _count Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Frank Haverkamp <haver@linux.vnet.ibm.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-10-28 08:25:18 -04:00
Martyn Welch	6ad37567b6	vme: vme_get_size potentially returning incorrect value on failure The function vme_get_size returns the size of the window to the caller, however it doesn't check the return value of the call to vme_master_get. Return 0 on failure rather than anything else. Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-10-28 08:25:18 -04:00
Rob Herring	d0f4bce2bc	tty: serial_core: fix NULL struct tty pointer access in uart_write_wakeup Since commit `761ed4a945` ("tty: serial_core: convert uart_close to use tty_port_close"), the serial console is broken on various systems and typing "reboot" splats the following on the serial console: INIT: Sending p[ 427.863916] BUG: unable to handle kernel NULL pointer dereference at 00000000000001e0 [ 427.885156] IP: [] tty_wakeup+0xc/0x70 [ 427.898337] PGD 0 [ 427.902051] [ 427.907498] Oops: 0000 [#1] PREEMPT SMP [ 427.917635] Modules linked in: nfsv3 nfs_acl nfs fscache lockd sunrpc grace edd af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave fuse loop md_mod dm_mod joydev hid_generic usbhid ipmi_ssif ohci_pci ohci_hcd ehci_pci ehci_hcd e1000e ptp firewire_ohci edac_core pps_core tpm_infineon sp5100_tco firewire_core acpi_cpufreq serio_raw pcspkr fjes usbcore shpchp edac_mce_amd tpm_tis ipmi_si tpm_tis_core i2c_piix4 k10temp sg ipmi_msghandler tpm sr_mod button cdrom kvm_amd kvm irqbypass crc_itu_t ast ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw ata_generic pata_atiixp [ 428.054179] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc1-1.g73e3f23-default #1 [ 428.072868] Hardware name: System manufacturer System Product Name/KGP(M)E-D16, BIOS 0902 12/03/2010 [ 428.094755] task: ffffffffa2c0d500 task.stack: ffffffffa2c00000 [ 428.109717] RIP: 0010:[] [] tty_wakeup+0xc/0x70 [ 428.128407] RSP: 0018:ffff9a1a5fc03df8 EFLAGS: 00010086 [ 428.142184] RAX: ffff9a1857258000 RBX: ffffffffa3050ea0 RCX: 0000000000000000 [ 428.159649] RDX: 000000000000001b RSI: 0000000000000000 RDI: 0000000000000000 [ 428.177109] RBP: ffff9a1a5fc03e08 R08: 0000000000000000 R09: 0000000000000000 [ 428.194547] R10: 0000000000021c77 R11: 0000000000000000 R12: ffff9a1857258000 [ 428.212002] R13: 0000000000000000 R14: 0000000000000020 R15: 0000000000000020 [ 428.229481] FS: 0000000000000000(0000) GS:ffff9a1a5fc00000(0000) knlGS:0000000000000000 [ 428.248938] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 428.263726] CR2: 00000000000001e0 CR3: 0000000390c06000 CR4: 00000000000006f0 [ 428.281331] Stack: [ 428.288696] ffffffffa3050ea0 ffff9a1857258000 ffff9a1a5fc03e18 ffffffffa24e0ab1 [ 428.307064] ffff9a1a5fc03e40 ffffffffa24e8865 ffffffffa3050ea0 00000000000000c2 [ 428.325456] 0000000000000046 ffff9a1a5fc03e78 ffffffffa24e8a5f ffffffffa3050ea0 [ 428.343905] Call Trace: [ 428.352319] [ 428.356216] [] uart_write_wakeup+0x21/0x30 The problem is for console ports, the serial port is not shutdown and interrupts may fire after the struct tty is gone. Simply calling the tty_port helper tty_port_tty_wakeup instead of tty_wakeup directly will ensure there is a valid struct tty. Fixes: `761ed4a945` ("tty: serial_core: convert uart_close to use tty_port_close") Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Mike Galbraith <mgalbraith@suse.de> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-serial@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-10-28 08:13:07 -04:00
Geert Uytterhoeven	4dda864d73	tty: serial_core: Fix serial console crash on port shutdown The port->console flag is always false, as uart_console() is called before the serial console has been registered. Hence for a serial port used as the console, uart_tty_port_shutdown() will still be called when userspace closes the port, powering it down. This may lead to a system lock up when the serial console driver writes to the serial port's registers. To fix this, move the setting of port->console after the call to uart_configure_port(), which registers the serial console. Fixes: `761ed4a945` ("tty: serial_core: convert uart_close to use tty_port_close") Reported-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Rob Herring <robh@kernel.org> Tested-by: Mugunthan V N <mugunthanvnm@ti.com> Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> [robh: rebased on tty-linus] Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-10-28 08:13:07 -04:00
Richard Genoud	9bcffe7575	tty/serial: at91: fix hardware handshake on Atmel platforms After commit `1cf6e8fc83` ("tty/serial: at91: fix RTS line management when hardware handshake is enabled"), the hardware handshake wasn't functional anymore on Atmel platforms (beside SAMA5D2). To understand why, one has to understand the flag ATMEL_US_USMODE_HWHS first: Before commit `1cf6e8fc83` ("tty/serial: at91: fix RTS line management when hardware handshake is enabled"), this flag was never set. Thus, the CTS/RTS where only handled by serial_core (and everything worked just fine). This commit introduced the use of the ATMEL_US_USMODE_HWHS flag, enabling it for all boards when the user space enables flow control. When the ATMEL_US_USMODE_HWHS is set, the Atmel USART controller handles a part of the flow control job: - disable the transmitter when the CTS pin gets high. - drive the RTS pin high when the DMA buffer transfer is completed or PDC RX buffer full or RX FIFO is beyond threshold. (depending on the controller version). NB: This feature is not mandatory for the flow control to work. (Nevertheless, it's very useful if low latencies are needed.) Now, the specifics of the ATMEL_US_USMODE_HWHS flag: - For platforms with DMAC and no FIFOs (sam9x25, sam9x35, sama5D3, sama5D4, sam9g15, sam9g25, sam9g35)* this feature simply doesn't work. ( source: https://lkml.org/lkml/2016/9/7/598 ) Tested it on sam9g35, the RTS pins always stays up, even when RXEN=1 or a new DMA transfer descriptor is set. => ATMEL_US_USMODE_HWHS must not be used for those platforms - For platforms with a PDC (sam926{0,1,3}, sam9g10, sam9g20, sam9g45, sam9g46), there's another kind of problem. Once the flag ATMEL_US_USMODE_HWHS is set, the RTS pin can't be driven anymore via RTSEN/RTSDIS in USART Control Register. The RTS pin can only be driven by enabling/disabling the receiver or setting RCR=RNCR=0 in the PDC (Receive (Next) Counter Register). => Doing this is beyond the scope of this patch and could add other bugs, so the original (and working) behaviour should be set for those platforms (meaning ATMEL_US_USMODE_HWHS flag should be unset). - For platforms with a FIFO (sama5d2), the RTS pin is driven according to the RX FIFO thresholds, and can be also driven by RTSEN/RTSDIS in USART Control Register. No problem here. (This was the use case of commit `1cf6e8fc83` ("tty/serial: at91: fix RTS line management when hardware handshake is enabled")) NB: If the CTS pin declared as a GPIO in the DTS, (for instance cts-gpios = <&pioA PIN_PB31 GPIO_ACTIVE_LOW>), the transmitter will be disabled. => ATMEL_US_USMODE_HWHS flag can be set for this platform ONLY IF the CTS pin is not a GPIO. So, the only case when ATMEL_US_USMODE_HWHS can be enabled is when (atmel_use_fifo(port) && !mctrl_gpio_to_gpiod(atmel_port->gpios, UART_GPIO_CTS)) Tested on all Atmel USART controller flavours: AT91SAM9G35-CM (DMAC flavour), AT91SAM9G20-EK (PDC flavour), SAMA5D2xplained (FIFO flavour). * the list may not be exhaustive Cc: <stable@vger.kernel.org> #4.4+ (beware, missing atmel_port variable) Fixes: `1cf6e8fc83` ("tty/serial: at91: fix RTS line management when hardware handshake is enabled") Signed-off-by: Richard Genoud <richard.genoud@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-10-28 08:10:48 -04:00
Imre Palik	f92b760414	perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors perf doesn't seem to honour the number of fixed counters specified by CPUID leaf 0xa. It always assumes that Intel CPUs have at least 3 fixed counters. So if some of the fixed counters are masked out by the hypervisor, it still tries to check/set them. This patch makes perf behave nicer when the kernel is running under a hypervisor that doesn't expose all the counters. This patch contains some ideas from Matt Wilson. Signed-off-by: Imre Palik <imrep@amazon.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Kozyrev <alexander.kozyrev@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Artyom Kuanbekov <artyom.kuanbekov@intel.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Wilson <msw@amazon.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1477037939-15605-1-git-send-email-imrep.amz@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-28 11:06:25 +02:00
Jiri Olsa	5aab90ce1e	perf/powerpc: Don't call perf_event_disable() from atomic context The trinity syscall fuzzer triggered following WARN() on powerpc: WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278 ... NIP [c00000000093aedc] .hw_breakpoint_handler+0x28c/0x2b0 LR [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 Call Trace: [c0000002f7933580] [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 (unreliable) [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0 [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0 [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0 [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100 [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48 Followed by a lockdep warning: =============================== [ INFO: suspicious RCU usage. ] 4.8.0-rc5+ #7 Tainted: G W ------------------------------- ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side critical section! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by ls/2998: #0: (rcu_read_lock){......}, at: [<c0000000000f6a00>] .__atomic_notifier_call_chain+0x0/0x1c0 #1: (rcu_read_lock){......}, at: [<c00000000093ac50>] .hw_breakpoint_handler+0x0/0x2b0 stack backtrace: CPU: 9 PID: 2998 Comm: ls Tainted: G W 4.8.0-rc5+ #7 Call Trace: [c0000002f7933150] [c00000000094b1f8] .dump_stack+0xe0/0x14c (unreliable) [c0000002f79331e0] [c00000000013c468] .lockdep_rcu_suspicious+0x138/0x180 [c0000002f7933270] [c0000000001005d8] .___might_sleep+0x278/0x2e0 [c0000002f7933300] [c000000000935584] .mutex_lock_nested+0x64/0x5a0 [c0000002f7933410] [c00000000023084c] .perf_event_ctx_lock_nested+0x16c/0x380 [c0000002f7933500] [c000000000230a80] .perf_event_disable+0x20/0x60 [c0000002f7933580] [c00000000093aeec] .hw_breakpoint_handler+0x29c/0x2b0 [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0 [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0 [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0 [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100 [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48 While it looks like the first WARN() is probably valid, the other one is triggered by disabling event via perf_event_disable() from atomic context. The event is disabled here in case we were not able to emulate the instruction that hit the breakpoint. By disabling the event we unschedule the event and make sure it's not scheduled back. But we can't call perf_event_disable() from atomic context, instead we need to use the event's pending_disable irq_work method to disable it. Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161026094824.GA21397@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-28 11:06:25 +02:00
Jiri Olsa	0933840acf	perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic CAI Qian reported a crash in the PMU uncore device removal code, enabled by the CONFIG_DEBUG_TEST_DRIVER_REMOVE=y option: https://marc.info/?l=linux-kernel&m=147688837328451 The reason for the crash is that perf_pmu_unregister() tries to remove a PMU device which is not added at this point. We add PMU devices only after pmu_bus is registered, which happens in the perf_event_sysfs_init() call and sets the 'pmu_bus_running' flag. The fix is to get the 'pmu_bus_running' flag state at the point the PMU is taken out of the PMU list and remove the device later only if it's set. Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161020111011.GA13361@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-28 11:06:25 +02:00
Borislav Petkov	1c27f646b1	x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y We needed the physical address of the container in order to compute the offset within the relocated ramdisk. And we did this by doing __pa() on the virtual address. However, __pa() does checks whether the physical address is within PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address which doesn't have the randomization offset into a function which uses PAGE_OFFSET which does have that offset. This makes this check fire: VIRTUAL_BUG_ON((x > y) \|\| !phys_addr_valid(x)); ^^^^^^ due to the randomization offset. The fix is as simple as using __pa_nodebug() because we do that randomization offset accounting later in that function ourselves. Reported-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm <linux-mm@kvack.org> Cc: stable@vger.kernel.org # 4.9 Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-28 10:29:59 +02:00
Linus Torvalds	14970f204b	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "20 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk cris/arch-v32: cryptocop: print a hex number after a 0x prefix ipack: print a hex number after a 0x prefix block: DAC960: print a hex number after a 0x prefix fs: exofs: print a hex number after a 0x prefix lib/genalloc.c: start search from start of chunk mm: memcontrol: do not recurse in direct reclaim CREDITS: update credit information for Martin Kepplinger proc: fix NULL dereference when reading /proc/<pid>/auxv mm: kmemleak: ensure that the task stack is not freed during scanning lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB latent_entropy: raise CONFIG_FRAME_WARN by default kconfig.h: remove config_enabled() macro ipc: account for kmem usage on mqueue and msg mm/slab: improve performance of gathering slabinfo stats mm: page_alloc: use KERN_CONT where appropriate mm/list_lru.c: avoid error-path NULL pointer deref h8300: fix syscall restarting kcov: properly check if we are in an interrupt mm/slab: fix kmemcg cache creation delayed issue	2016-10-27 19:58:39 -07:00
Dimitri Sivanich	8e819101ce	drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk Would like to have this be a decimal number. Link: http://lkml.kernel.org/r/20161026134746.GA30169@sgi.com Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Uwe Kleine-König	17a8893956	cris/arch-v32: cryptocop: print a hex number after a 0x prefix It makes the result hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-6-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Uwe Kleine-König	9105585d13	ipack: print a hex number after a 0x prefix It makes the result hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-4-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Cc: Jens Taprogge <jens.taprogge@taprogge.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Uwe Kleine-König	ee52c44dee	block: DAC960: print a hex number after a 0x prefix It makes the message hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Uwe Kleine-König	14f947c87a	fs: exofs: print a hex number after a 0x prefix It makes the message hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-2-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Boaz Harrosh <ooo@electrozaur.com> Cc: Benny Halevy <bhalevy@primarydata.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Daniel Mentz	62e931fac4	lib/genalloc.c: start search from start of chunk gen_pool_alloc_algo() iterates over the chunks of a pool trying to find a contiguous block of memory that satisfies the allocation request. The shortcut if (size > atomic_read(&chunk->avail)) continue; makes the loop skip over chunks that do not have enough bytes left to fulfill the request. There are two situations, though, where an allocation might still fail: (1) The available memory is not contiguous, i.e. the request cannot be fulfilled due to external fragmentation. (2) A race condition. Another thread runs the same code concurrently and is quicker to grab the available memory. In those situations, the loop calls pool->algo() to search the entire chunk, and pool->algo() returns some value that is >= end_bit to indicate that the search failed. This return value is then assigned to start_bit. The variables start_bit and end_bit describe the range that should be searched, and this range should be reset for every chunk that is searched. Today, the code fails to reset start_bit to 0. As a result, prefixes of subsequent chunks are ignored. Memory allocations might fail even though there is plenty of room left in these prefixes of those other chunks. Fixes: `7f184275aa` ("lib, Make gen_pool memory allocator lockless") Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com Signed-off-by: Daniel Mentz <danielmentz@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Johannes Weiner	89a2848381	mm: memcontrol: do not recurse in direct reclaim On 4.0, we saw a stack corruption from a page fault entering direct memory cgroup reclaim, calling into btrfs_releasepage(), which then tried to allocate an extent and recursed back into a kmem charge ad nauseam: [...] btrfs_releasepage+0x2c/0x30 try_to_release_page+0x32/0x50 shrink_page_list+0x6da/0x7a0 shrink_inactive_list+0x1e5/0x510 shrink_lruvec+0x605/0x7f0 shrink_zone+0xee/0x320 do_try_to_free_pages+0x174/0x440 try_to_free_mem_cgroup_pages+0xa7/0x130 try_charge+0x17b/0x830 memcg_charge_kmem+0x40/0x80 new_slab+0x2d9/0x5a0 __slab_alloc+0x2fd/0x44f kmem_cache_alloc+0x193/0x1e0 alloc_extent_state+0x21/0xc0 __clear_extent_bit+0x2b5/0x400 try_release_extent_mapping+0x1a3/0x220 __btrfs_releasepage+0x31/0x70 btrfs_releasepage+0x2c/0x30 try_to_release_page+0x32/0x50 shrink_page_list+0x6da/0x7a0 shrink_inactive_list+0x1e5/0x510 shrink_lruvec+0x605/0x7f0 shrink_zone+0xee/0x320 do_try_to_free_pages+0x174/0x440 try_to_free_mem_cgroup_pages+0xa7/0x130 try_charge+0x17b/0x830 mem_cgroup_try_charge+0x65/0x1c0 handle_mm_fault+0x117f/0x1510 __do_page_fault+0x177/0x420 do_page_fault+0xc/0x10 page_fault+0x22/0x30 On later kernels, kmem charging is opt-in rather than opt-out, and that particular kmem allocation in btrfs_releasepage() is no longer being charged and won't recurse and overrun the stack anymore. But it's not impossible for an accounted allocation to happen from the memcg direct reclaim context, and we needed to reproduce this crash many times before we even got a useful stack trace out of it. Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to avoid recursing into any other form of direct reclaim. Then let recursive charges from PF_MEMALLOC contexts bypass the cgroup limit. Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Martin Kepplinger	8f72cb4ef9	CREDITS: update credit information for Martin Kepplinger Content and employer changed. Link: http://lkml.kernel.org/r/1477304102-28830-1-git-send-email-martin.kepplinger@ginzinger.com Signed-off-by: Martin Kepplinger <martink@posteo.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Leon Yu	06b2849d10	proc: fix NULL dereference when reading /proc/<pid>/auxv Reading auxv of any kernel thread results in NULL pointer dereferencing in auxv_read() where mm can be NULL. Fix that by checking for NULL mm and bailing out early. This is also the original behavior changed by recent commit `c531716785` ("proc: switch auxv to use of __mem_open()"). # cat /proc/2/auxv Unable to handle kernel NULL pointer dereference at virtual address 000000a8 Internal error: Oops: 17 [#1] PREEMPT SMP ARM CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1 Hardware name: BCM2709 task: ea3b0b00 task.stack: e99b2000 PC is at auxv_read+0x24/0x4c LR is at do_readv_writev+0x2fc/0x37c Process cat (pid: 113, stack limit = 0xe99b2210) Call chain: auxv_read do_readv_writev vfs_readv default_file_splice_read splice_direct_to_actor do_splice_direct do_sendfile SyS_sendfile64 ret_fast_syscall Fixes: `c531716785` ("proc: switch auxv to use of __mem_open()") Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com Signed-off-by: Leon Yu <chianglungyu@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Janis Danisevskis <jdanis@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Catalin Marinas	37df49f433	mm: kmemleak: ensure that the task stack is not freed during scanning Commit `68f24b08ee` ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed during kmemleak_scan() execution, leading to either a NULL pointer fault (if task->stack is NULL) or kmemleak accessing already freed memory. This patch uses the new try_get_task_stack() API to ensure that the task stack is not freed during kmemleak stack scanning. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=173901. Fixes: `68f24b08ee` ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK") Link: http://lkml.kernel.org/r/1476266223-14325-1-git-send-email-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: CAI Qian <caiqian@redhat.com> Tested-by: CAI Qian <caiqian@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: CAI Qian <caiqian@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Dmitry Vyukov	02754e0a48	lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls. Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on second level). Size of each stack is (num_frames + 3) * sizeof(long). Which gives us ~84K stacks. This capacity was chosen empirically and it is enough to run kernel normally. However, when lots of configs are enabled and a fuzzer tries to maximize code coverage, it easily hits the limit within tens of minutes. I've tested for long a time with number of top level entries bumped 4x (4096). And I think I've seen overflow only once. But I don't have all configs enabled and code coverage has not reached maximum yet. So bump it 8x to 8192. Since we have two-level table, memory cost of this is very moderate -- currently the top-level table is 8KB, with this patch it is 64KB, which is negligible under KASAN. Here is some approx math. 128MB allows us to memorize ~670K stacks (assuming stack is ~200b). I've grepped kernel for kmalloc\|kfree\|kmem_cache_alloc\|kmem_cache_free\| kzalloc\|kstrdup\|kstrndup\|kmemdup and it gives ~60K matches. Most of alloc/free call sites are reachable with only one stack. But some utility functions can have large fanout. Assuming average fanout is 5x, total number of alloc/free stacks is ~300K. Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Kees Cook	0e07f663c9	latent_entropy: raise CONFIG_FRAME_WARN by default When building with the latent_entropy plugin, set the default CONFIG_FRAME_WARN to 2048, since some __init functions have many basic blocks that, when instrumented by the latent_entropy plugin, grow beyond 1024 byte stack size on 32-bit builds. Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michal Marek <mmarek@suse.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Masahiro Yamada	c0a0aba8e4	kconfig.h: remove config_enabled() macro The use of config_enabled() is ambiguous. For config options, IS_ENABLED(), IS_REACHABLE(), etc. will make intention clearer. Sometimes config_enabled() has been used for non-config options because it is useful to check whether the given symbol is defined or not. I have been tackling on deprecating config_enabled(), and now is the time to finish this work. Some new users have appeared for v4.9-rc1, but it is trivial to replace them: - arch/x86/mm/kaslr.c replace config_enabled() with IS_ENABLED() because CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean. - include/asm-generic/export.h replace config_enabled() with __is_defined(). Then, config_enabled() can be removed now. Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config options, and __is_defined() for non-config symbols. Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Garnier <thgarnie@google.com> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Aristeu Rozanski	8c8d4d4520	ipc: account for kmem usage on mqueue and msg When kmem accounting switched from account by default to only account if flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out. The production use case at hand is that mqueues should be customizable via sysctls in Docker containers in a Kubernetes cluster. This can only be safely allowed to the users of the cluster (without the risk that they can cause resource shortage on a node, influencing other users' containers) if all resources they control are bounded, i.e. accounted for. Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Reported-by: Stefan Schimanski <sttts@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Stefan Schimanski <sttts@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Aruna Ramakrishna	07a63c41fa	mm/slab: improve performance of gathering slabinfo stats On large systems, when some slab caches grow to millions of objects (and many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2 seconds. During this time, interrupts are disabled while walking the slab lists (slabs_full, slabs_partial, and slabs_free) for each node, and this sometimes causes timeouts in other drivers (for instance, Infiniband). This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for total number of allocated slabs per node, per cache. This counter is updated when a slab is created or destroyed. This enables us to skip traversing the slabs_full list while gathering slabinfo statistics, and since slabs_full tends to be the biggest list when the cache is large, it results in a dramatic performance improvement. Getting slabinfo statistics now only requires walking the slabs_free and slabs_partial lists, and those lists are usually much smaller than slabs_full. We tested this after growing the dentry cache to 70GB, and the performance improved from 2s to 5ms. Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.com Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Joe Perches	1f84a18fc0	mm: page_alloc: use KERN_CONT where appropriate Recent changes to printk require KERN_CONT uses to continue logging messages. So add KERN_CONT where necessary. [akpm@linux-foundation.org: coding-style fixes] Fixes: `4bcc595ccd` ("printk: reinstate KERN_CONT for printing continuation lines") Link: http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:43 -07:00
Alexander Polakov	1bc11d70b5	mm/list_lru.c: avoid error-path NULL pointer deref As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821: After some analysis it seems to be that the problem is in alloc_super(). In case list_lru_init_memcg() fails it goes into destroy_super(), which calls list_lru_destroy(). And in list_lru_init() we see that in case memcg_init_list_lru() fails, lru->node is freed, but not set NULL, which then leads list_lru_destroy() to believe it is initialized and call memcg_destroy_list_lru(). memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus, which is NULL. [akpm@linux-foundation.org: add comment] Signed-off-by: Alexander Polakov <apolyakov@beget.ru> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-27 18:43:42 -07:00

1 2 3 4 5 ...

634110 Commits