linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-30 14:52:05 +00:00

Author	SHA1	Message	Date
Tang Chen	0bd8542008	mem-hotplug: reset node present pages when hot-adding a new pgdat When memory is hot-added, all the memory is in offline state. So clear all zones' present_pages because they will be updated in online_pages() and offline_pages(). Otherwise, /proc/zoneinfo will corrupt: When the memory of node2 is offline: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 8388608 managed 0 When we online memory on node2: # cat /proc/zoneinfo ...... Node 2, zone Movable ...... spanned 8388608 present 16777216 managed 8388608 Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:06 -08:00
Tang Chen	f784a3f196	mem-hotplug: reset node managed pages when hot-adding a new pgdat In free_area_init_core(), zone->managed_pages is set to an approximate value for lowmem, and will be adjusted when the bootmem allocator frees pages into the buddy system. But free_area_init_core() is also called by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages of the newly added node's pgdat is set to an approximate value in the very beginning. Even if the memory on that node has node been onlined, /sys/device/system/node/nodeXXX/meminfo has wrong value: hot-add node2 (memory not onlined) cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB This patch fixes this problem by reset node managed pages to 0 after hot-adding a new node. 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() 2. Make reset_node_managed_pages() non-static 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:06 -08:00
Joonsoo Kim	57cbc87e03	mm/debug-pagealloc: correct freepage accounting and order resetting One thing I did in this patch is fixing freepage accounting. If we clear guard page and link it onto isolate buddy list, we should not increase freepage count. This patch adds conditional branch to skip counting in this case. Without this patch, this overcounting happens frequently if guard order is set and CMA is used. Another thing fixed in this patch is the target to reset order. In __free_one_page(), we check the buddy page whether it is a guard page or not. And, if so, we should clear guard attribute on the buddy page and reset order of it to 0. But, current code resets original page's order rather than buddy one's. Maybe, this doesn't have any problem, because whole merged page's order will be re-assigned soon. But, it is better to correct code. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:06 -08:00
Jan Kara	8edc6e1688	fanotify: fix notification of groups with inode & mount marks fsnotify() needs to merge inode and mount marks lists when notifying groups about events so that ignore masks from inode marks are reflected in mount mark notifications and groups are notified in proper order (according to priorities). Currently the sorting of the lists done by fsnotify_add_inode_mark() / fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted ignore masks not being used in some cases. Fix the problem by always using the same comparison function when sorting / merging the mark lists. Thanks to Heinrich Schuchardt for improvements of my patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721 Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:06 -08:00
Vlastimil Babka	1d5bfe1ffb	mm, compaction: prevent infinite loop in compact_zone Several people have reported occasionally seeing processes stuck in compact_zone(), even triggering soft lockups, in 3.18-rc2+. Testing a revert of commit `e14c720efd` ("mm, compaction: remember position within pageblock in free pages scanner") fixed the issue, although the stuck processes do not appear to involve the free scanner. Finally, by code inspection, the bug was found in isolate_migratepages() which uses a slightly different condition to detect if the migration and free scanners have met, than compact_finished(). That has not been a problem until commit `e14c720efd` allowed the free scanner position between individual invocations to be in the middle of a pageblock. In a relatively rare case, the migration scanner position can end up at the beginning of a pageblock, with the free scanner position in the middle of the same pageblock. If it's the migration scanner's turn, isolate_migratepages() exits immediately (without updating the position), while compact_finished() decides to continue compaction, resulting in a potentially infinite loop. The system can recover only if another process creates enough high-order pages to make the watermark checks in compact_finished() pass. This patch fixes the immediate problem by bumping the migration scanner's position to meet the free scanner in isolate_migratepages(), when both are within the same pageblock. This causes compact_finished() to terminate properly. A more robust check in compact_finished() is planned as a cleanup for better future maintainability. Fixes: `e14c720efd` ("mm, compaction: remember position within pageblock in free pages scanner) Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: P. Christeas <xrg@linux.gr> Tested-by: P. Christeas <xrg@linux.gr> Link: http://marc.info/?l=linux-mm&m=141508604232522&w=2 Reported-by: Norbert Preining <preining@logic.at> Tested-by: Norbert Preining <preining@logic.at> Link: https://lkml.org/lkml/2014/11/4/904 Reported-by: Pavel Machek <pavel@ucw.cz> Link: https://lkml.org/lkml/2014/11/7/164 Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:06 -08:00
Michal Nazarewicz	dae803e165	mm: alloc_contig_range: demote pages busy message from warn to info Having test_pages_isolated failure message as a warning confuses users into thinking that it is more serious than it really is. In reality, if called via CMA, allocation will be retried so a single test_pages_isolated failure does not prevent allocation from succeeding. Demote the warning message to an info message and reformat it such that the text "failed" does not appear and instead a less worrying "PFNS busy" is used. This message is trivially reproducible on a 10GB x86 machine on 3.16.y kernels configured with CONFIG_DMA_CMA. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:05 -08:00
Joonsoo Kim	95069ac8da	mm/slab: fix unalignment problem on Malta with EVA due to slab merge Unlike SLUB, sometimes, object isn't started at the beginning of the slab in SLAB. This causes the unalignment problem after slab merging is supported by commit `12220dea07` ("mm/slab: support slab merge"). Following is the report from Markos that fail to boot on Malta with EVA. Calibrating delay loop... 19.86 BogoMIPS (lpj=99328) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 4096 (order: 0, 16384 bytes) Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes) Kernel bug detected[#1]: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-05639-g12220dea07f1 #1631 task: 1f04f5d8 ti: 1f050000 task.ti: 1f050000 epc : 80141190 alloc_unbound_pwq+0x234/0x304 Not tainted ra : 80141184 alloc_unbound_pwq+0x228/0x304 Process swapper/0 (pid: 1, threadinfo=1f050000, task=1f04f5d8, tls=00000000) Call Trace: alloc_unbound_pwq+0x234/0x304 apply_workqueue_attrs+0x11c/0x294 __alloc_workqueue_key+0x23c/0x470 init_workqueues+0x320/0x400 do_one_initcall+0xe8/0x23c kernel_init_freeable+0x9c/0x224 kernel_init+0x10/0x100 ret_from_kernel_thread+0x14/0x1c [ end trace cb88537fdc8fa200 ] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b alloc_unbound_pwq() allocates slab object from pool_workqueue. This kmem_cache requires 256 bytes alignment, but, current merging code doesn't honor that, and merge it with kmalloc-256. kmalloc-256 requires only cacheline size alignment so that above failure occurs. However, in x86, kmalloc-256 is luckily aligned in 256 bytes, so the problem didn't happen on it. To fix this problem, this patch introduces alignment mismatch check in find_mergeable(). This will fix the problem. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Markos Chandras <Markos.Chandras@imgtec.com> Tested-by: Markos Chandras <Markos.Chandras@imgtec.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:05 -08:00
Joonsoo Kim	3c605096d3	mm/page_alloc: restrict max order of merging on isolated pageblock Current pageblock isolation logic could isolate each pageblock individually. This causes freepage accounting problem if freepage with pageblock order on isolate pageblock is merged with other freepage on normal pageblock. We can prevent merging by restricting max order of merging to pageblock order if freepage is on isolate pageblock. A side-effect of this change is that there could be non-merged buddy freepage even if finishing pageblock isolation, because undoing pageblock isolation is just to move freepage from isolate buddy list to normal buddy list rather than to consider merging. So, the patch also makes undoing pageblock isolation consider freepage merge. When un-isolation, freepage with more than pageblock order and it's buddy are checked. If they are on normal pageblock, instead of just moving, we isolate the freepage and free it in order to get merged. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:05 -08:00
Joonsoo Kim	8f82b55dd5	mm/page_alloc: move freepage counting logic to __free_one_page() All the caller of __free_one_page() has similar freepage counting logic, so we can move it to __free_one_page(). This reduce line of code and help future maintenance. This is also preparation step for "mm/page_alloc: restrict max order of merging on isolated pageblock" which fix the freepage counting problem on freepage with more than pageblock order. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:05 -08:00
Joonsoo Kim	51bb1a4093	mm/page_alloc: add freepage on isolate pageblock to correct buddy list In free_pcppages_bulk(), we use cached migratetype of freepage to determine type of buddy list where freepage will be added. This information is stored when freepage is added to pcp list, so if isolation of pageblock of this freepage begins after storing, this cached information could be stale. In other words, it has original migratetype rather than MIGRATE_ISOLATE. There are two problems caused by this stale information. One is that we can't keep these freepages from being allocated. Although this pageblock is isolated, freepage will be added to normal buddy list so that it could be allocated without any restriction. And the other problem is incorrect freepage accounting. Freepages on isolate pageblock should not be counted for number of freepage. Following is the code snippet in free_pcppages_bulk(). /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, page_to_pfn(page), zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); if (likely(!is_migrate_isolate_page(page))) { __mod_zone_page_state(zone, NR_FREE_PAGES, 1); if (is_migrate_cma(mt)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); } As you can see above snippet, current code already handle second problem, incorrect freepage accounting, by re-fetching pageblock migratetype through is_migrate_isolate_page(page). But, because this re-fetched information isn't used for __free_one_page(), first problem would not be solved. This patch try to solve this situation to re-fetch pageblock migratetype before __free_one_page() and to use it for __free_one_page(). In addition to move up position of this re-fetch, this patch use optimization technique, re-fetching migratetype only if there is isolate pageblock. Pageblock isolation is rare event, so we can avoid re-fetching in common case with this optimization. This patch also correct migratetype of the tracepoint output. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:05 -08:00
Joonsoo Kim	ad53f92eb4	mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype Before describing bugs itself, I first explain definition of freepage. 1. pages on buddy list are counted as freepage. 2. pages on isolate migratetype buddy list are not counted as freepage. 3. pages on cma buddy list are counted as CMA freepage, too. Now, I describe problems and related patch. Patch 1: There is race conditions on getting pageblock migratetype that it results in misplacement of freepages on buddy list, incorrect freepage count and un-availability of freepage. Patch 2: Freepages on pcp list could have stale cached information to determine migratetype of buddy list to go. This causes misplacement of freepages on buddy list and incorrect freepage count. Patch 4: Merging between freepages on different migratetype of pageblocks will cause freepages accouting problem. This patch fixes it. Without patchset [3], above problem doesn't happens on my CMA allocation test, because CMA reserved pages aren't used at all. So there is no chance for above race. With patchset [3], I did simple CMA allocation test and get below result: - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation - run kernel build (make -j16) on background - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval - Result: more than 5000 freepage count are missed With patchset [3] and this patchset, I found that no freepage count are missed so that I conclude that problems are solved. On my simple memory offlining test, these problems also occur on that environment, too. This patch (of 4): There are two paths to reach core free function of buddy allocator, __free_one_page(), one is free_one_page()->__free_one_page() and the other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page(). Each paths has race condition causing serious problems. At first, this patch is focused on first type of freepath. And then, following patch will solve the problem in second type of freepath. In the first type of freepath, we got migratetype of freeing page without holding the zone lock, so it could be racy. There are two cases of this race. 1. pages are added to isolate buddy list after restoring orignal migratetype CPU1 CPU2 get migratetype => return MIGRATE_ISOLATE call free_one_page() with MIGRATE_ISOLATE grab the zone lock unisolate pageblock release the zone lock grab the zone lock call __free_one_page() with MIGRATE_ISOLATE freepage go into isolate buddy list, although pageblock is already unisolated This may cause two problems. One is that we can't use this page anymore until next isolation attempt of this pageblock, because freepage is on isolate buddy list. The other is that freepage accouting could be wrong due to merging between different buddy list. Freepages on isolate buddy list aren't counted as freepage, but ones on normal buddy list are counted as freepage. If merge happens, buddy freepage on normal buddy list is inevitably moved to isolate buddy list without any consideration of freepage accouting so it could be incorrect. 2. pages are added to normal buddy list while pageblock is isolated. It is similar with above case. This also may cause two problems. One is that we can't keep these freepages from being allocated. Although this pageblock is isolated, freepage would be added to normal buddy list so that it could be allocated without any restriction. And the other problem is same as case 1, that it, incorrect freepage accouting. This race condition would be prevented by checking migratetype again with holding the zone lock. Because it is somewhat heavy operation and it isn't needed in common case, we want to avoid rechecking as much as possible. So this patch introduce new variable, nr_isolate_pageblock in struct zone to check if there is isolated pageblock. With this, we can avoid to re-check migratetype in common case and do it only if there is isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above mentioned problems. Changes from v3: Add one more check in free_one_page() that checks whether migratetype is MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:05 -08:00
Joonsoo Kim	5842001630	mm/compaction: skip the range until proper target pageblock is met Commit `7d49d88683` ("mm, compaction: reduce zone checking frequency in the migration scanner") has a side-effect that changes the iteration range calculation. Before the change, block_end_pfn is calculated using start_pfn, but now it blindly adds pageblock_nr_pages to the previous value. This causes the problem that isolation_start_pfn is larger than block_end_pfn when we isolate the page with more than pageblock order. In this case, isolation would fail due to an invalid range parameter. To prevent this, this patch implements skipping the range until a proper target pageblock is met. Without this patch, CMA with more than pageblock order always fails but with this patch it will succeed. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:05 -08:00
Weijie Yang	c406515239	zram: avoid kunmap_atomic() of a NULL pointer zram could kunmap_atomic() a NULL pointer in a rare situation: a zram page becomes a full-zeroed page after a partial write io. The current code doesn't handle this case and performs kunmap_atomic() on a NULL pointer, which panics the kernel. This patch fixes this issue. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang.kh@gmail.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-11-13 16:17:05 -08:00
Eric Dumazet	d649a7a81f	tcp: limit GSO packets to half cwnd In DC world, GSO packets initially cooked by tcp_sendmsg() are usually big, as sk_pacing_rate is high. When network is congested, cwnd can be smaller than the GSO packets found in socket write queue. tcp_write_xmit() splits GSO packets using the available cwnd, and we end up sending a single GSO packet, consuming all available cwnd. With GRO aggregation on the receiver, we might handle a single GRO packet, sending back a single ACK. 1) This single ACK might be lost TLP or RTO are forced to attempt a retransmit. 2) This ACK releases a full cwnd, sender sends another big GSO packet, in a ping pong mode. This behavior does not fill the pipes in the best way, because of scheduling artifacts. Make sure we always have at least two GSO packets in flight. This allows us to safely increase GRO efficiency without risking spurious retransmits. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:21:44 -05:00
Marcelo Leitner	19ca9fc144	vxlan: Do not reuse sockets for a different address family Currently, we only match against local port number in order to reuse socket. But if this new vxlan wants an IPv6 socket and a IPv4 one bound to that port, vxlan will reuse an IPv4 socket as IPv6 and a panic will follow. The following steps reproduce it: # ip link add vxlan6 type vxlan id 42 group 229.10.10.10 \ srcport 5000 6000 dev eth0 # ip link add vxlan7 type vxlan id 43 group ff0e::110 \ srcport 5000 6000 dev eth0 # ip link set vxlan6 up # ip link set vxlan7 up <panic> [ 4.187481] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 ... [ 4.188076] Call Trace: [ 4.188085] [<ffffffff81667c4a>] ? ipv6_sock_mc_join+0x3a/0x630 [ 4.188098] [<ffffffffa05a6ad6>] vxlan_igmp_join+0x66/0xd0 [vxlan] [ 4.188113] [<ffffffff810a3430>] process_one_work+0x220/0x710 [ 4.188125] [<ffffffff810a33c4>] ? process_one_work+0x1b4/0x710 [ 4.188138] [<ffffffff810a3a3b>] worker_thread+0x11b/0x3a0 [ 4.188149] [<ffffffff810a3920>] ? process_one_work+0x710/0x710 So address family must also match in order to reuse a socket. Reported-by: Jean-Tsung Hsiao <jhsiao@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:19:59 -05:00
Thomas Graf	6eba82248e	rhashtable: Drop gfp_flags arg in insert/remove functions Reallocation is only required for shrinking and expanding and both rely on a mutex for synchronization and callers of rhashtable_init() are in non atomic context. Therefore, no reason to continue passing allocation hints through the API. Instead, use GFP_KERNEL and add __GFP_NOWARN \| __GFP_NORETRY to allow for silent fall back to vzalloc() without the OOM killer jumping in as pointed out by Eric Dumazet and Eric W. Biederman. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:18:40 -05:00
David S. Miller	64bb7e9949	Merge branch 'mlx4-next' Or Gerlitz says: ==================== mlx4: Flexible (asymmetric) allocation of EQs and MSI-X vectors This series from Matan Barak is built as follows: The 1st two patches fix small bugs w.r.t firmware spec. Next are two patches which do more re-factoring of the init/fini flow and a patch that adds support for the QUERY_FUNC firmware command, these are all pre-steps for the major patch of the series. In this patch (#6) we change the order of talking/querying the firmware and enabling SRIOV. This allows to remote worst-case assumption w.r.t the number of available MSI-X vectors and EQs per function. The last patch easily enjoys this ordering change, to enable supports > 64 VFs over a firmware that allows for that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:16:28 -05:00
Matan Barak	de966c5928	net/mlx4_core: Support more than 64 VFs We now allow up to 126 VFs. Note though that certain firmware versions only allow up to 80 VFs. Moreover, old HCAs only support 64 VFs. In these cases, we limit the maximum number of VFs to 64. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:16:22 -05:00
Matan Barak	7ae0e400cd	net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs Previously, the driver queried the firmware in order to get the number of supported EQs. Under SRIOV, since this was done before the driver notified the firmware how many VFs it actually needs, the firmware had to take into account a worst case scenario and always allocated four EQs per VF, where one was used for events while the others were used for completions. Now, when the firmware supports the asymmetric allocation scheme, denoted by exposing num_sys_eqs > 0 (--> MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the QUERY_FUNC command to query the firmware before enabling SRIOV. Thus we can get more EQs and MSI-X vectors per function. Moreover, when running in the new firmware/driver mode, the limitation that the number of EQs should be a power of two is lifted. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:16:21 -05:00
Matan Barak	e8c4265bea	net/mlx4_core: Add QUERY_FUNC firmware command QUERY_FUNC firmware command could be used in order to query the number of EQs, reserved EQs, etc for a specific function. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:16:19 -05:00
Matan Barak	a0eacca948	net/mlx4_core: Refactor mlx4_load_one Refactor mlx4_load_one, as a preparation step for a new and more complicated load function. The goal is to support both newer firmware that required init_hca to be done before enable_sriov and legacy firmwares that requires things to be done the other way around. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:16:18 -05:00
Matan Barak	ffc39f6d6f	net/mlx4_core: Refactor mlx4_cmd_init and mlx4_cmd_cleanup Refactoring mlx4_cmd_init and mlx4_cmd_cleanup such that partial init and cleanup are possible. After this refactoring, calling mlx4_cmd_init several times is safe. This is necessary in the VF init flow when mlx4_init_hca returns -EACCESS, we need to issue cleanup and re-attempt to call it with the slave flag. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:16:17 -05:00
Matan Barak	225c6c8c6b	net/mlx4_core: Use correct variable type for mlx4_slave_cap We've used an incorrect type for the loop counter and the mlx4_QUERY_FUNC_CAP function. The current input modifier is either a port or a boolean. Since the number of ports is always a positive value < 255, we should use u8 instead of an integer with casting. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:16:16 -05:00
Matan Barak	7c68dd435b	net/mlx4_core: Fix wrong reading of reserved_eqs We mistakenly read the reserved_eqs field as a standard numeric value rather than a log2 value. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:16:15 -05:00
David S. Miller	9cf5476bfd	Merge branch 'rhash_prove_locking' Herbert Xu says: ==================== rhashtable: Allow local locks to be used and tested This series moves mutex_is_held entirely under PROVE_LOCKING so there is zero foot print when we're not debugging. More importantly it adds a parrent argument to mutex_is_held so that we can test local locks rather than global ones (e.g., per-namespace locks). ==================== Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:13:13 -05:00
Herbert Xu	7b4ce23534	rhashtable: Add parent argument to mutex_is_held Currently mutex_is_held can only test locks in the that are global since it takes no arguments. This prevents rhashtable from being used in places where locks are lock, e.g., per-namespace locks. This patch adds a parent field to mutex_is_held and rhashtable_params so that local locks can be used (and tested). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:13:05 -05:00
Herbert Xu	1b2f309d70	rhashtable: Move mutex_is_held under PROVE_LOCKING The rhashtable function mutex_is_held is only used when PROVE_LOCKING is enabled. This patch makes the mutex_is_held field in rhashtable optional depending on PROVE_LOCKING. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:13:05 -05:00
Herbert Xu	1f501d6252	netfilter: Move mutex_is_held under PROVE_LOCKING The rhashtable function mutex_is_held is only used when PROVE_LOCKING is enabled. This patch modifies netfilter so that we can rhashtable.h itself can later make mutex_is_held optional depending on PROVE_LOCKING. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:13:05 -05:00
Herbert Xu	9712756620	netlink: Move mutex_is_held under PROVE_LOCKING The rhashtable function mutex_is_held is only used when PROVE_LOCKING is enabled. This patch modifies netlink so that we can rhashtable.h itself can later make mutex_is_held optional depending on PROVE_LOCKING. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:13:05 -05:00
Enric Balletbo i Serra	ccf899a27c	smsc911x: power-up phydev before doing a software reset. With commit `be9dad1f9f` ("net: phy: suspend phydev when going to HALTED"), the PHY device will be put in a low-power mode using BMCR_PDOWN if the the interface is set down. The smsc911x driver does a software_reset opening the device driver (ndo_open). In such case, the PHY must be powered-up before access to any register and before calling the software_reset function. Otherwise, as the PHY is powered down the software reset fails and the interface can not be enabled again. This patch fixes this scenario that is easy to reproduce setting down the network interface and setting up again. $ ifconfig eth0 down $ ifconfig eth0 up ifconfig: SIOCSIFFLAGS: Input/output error Signed-off-by: Enric Balletbo i Serra <eballetbo@iseebcn.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:09:28 -05:00
Hisashi Nakamura	9488e1e5b3	net: sh_eth: Add r8a7793 support The device tree probing for R-Car M2N (r8a7793) is added. Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:03:53 -05:00
Hisashi Nakamura	966d6dbb6b	net: sh_eth: Add RMII mode setting in probe When using RMMI mode, it is necessary to change in probe. Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:03:53 -05:00
Michal Kubeček	fbe168ba91	net: generic dev_disable_lro() stacked device handling Large receive offloading is known to cause problems if received packets are passed to other host. Therefore the kernel disables it by calling dev_disable_lro() whenever a network device is enslaved in a bridge or forwarding is enabled for it (or globally). For virtual devices we need to disable LRO on the underlying physical device (which is actually receiving the packets). Current dev_disable_lro() code handles this propagation for a vlan (including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan. It doesn't handle other stacked devices and their combinations, in particular propagation from a bond to its slaves which often causes problems in virtualization setups. As we now have generic data structures describing the upper-lower device relationship, dev_disable_lro() can be generalized to disable LRO also for all lower devices (if any) once it is disabled for the device itself. For bonding and teaming devices, it is necessary to disable LRO not only on current slaves at the moment when dev_disable_lro() is called but also on any slave (port) added later. v2: use lower device links for all devices (including vlan and macvlan) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 14:48:56 -05:00
Dan Carpenter	b6267d3e80	amd-xgbe: fix ->rss_hash_type There was a missing break statement so we set everything to PKT_HASH_TYPE_L3 even when we intended to use PKT_HASH_TYPE_L4. Fixes: `5b9dfe299e` ('amd-xgbe: Provide support for receive side scaling') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 14:39:54 -05:00
Herbert Xu	0c828f2f83	lib: rhashtable - Remove weird non-ASCII characters from comments My editor spewed garbage that looked like memory corruption on my screen. It turns out that a number of occurences of "fi" got turned into a ligature. This patch replaces these ligatures with the ASCII letters "fi". Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 14:38:46 -05:00
Alexander Kochetkov	6ff53fd371	net/smsc911x: Fix delays in the PHY enable/disable routines Increased delay in the smsc911x_phy_disable_energy_detect (from 1ms to 2ms). Dropped delays in the smsc911x_phy_enable_energy_detect (100ms and 1ms). The patch affect SMSC LAN generation 4 chips with integrated PHY (LAN9221). I saw problems with soft reset due to wrong udelay timings. After I fixed udelay, I measured the time needed to bring integrated PHY from power-down to operational mode (the time beetween clearing EDPWRDOWN bit and soft reset complete event). I got 1ms (measured using ktime_get). The value is equal to the current value (1ms) used in the smsc911x_phy_disable_energy_detect. It is near the upper bound and in order to avoid rare soft reset faults it is doubled (2ms). I don't know official timing for bringing up integrated PHY as specs doesn't clarify this (or may be I didn't found). It looks safe to drop delays before and after setting EDPWRDOWN bit (enable PHY power-down mode). I didn't saw any regressions with the patch. The patch was reviewed by Steve Glendinning and Microchip Team. Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Acked-by: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 14:37:53 -05:00
Alexander Kochetkov	242bcd5ba1	net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode The patch affect SMSC LAN generation 4 chips with integrated PHY (LAN9221). It is possible that PHY could enter power-down mode (ENERGYON clear), between ENERGYON bit check in smsc911x_phy_disable_energy_detect and SRST bit set in smsc911x_soft_reset. This could happen, for example, if someone disconnect ethernet cable between the checks. The PHY in a power-down mode would prevent the MAC portion of chip to be software reseted. Initially found by code review, confirmed later using test case. This is low probability issue, and in order to reproduce it you have to run the script: while true; do ifconfig eth0 down ifconfig eth0 up \|\| break done While the script is running you have to plug/unplug ethernet cable many times (using gpio controlled ethernet switch, for example) until get: [ 4516.477783] ADDRCONF(NETDEV_UP): eth0: link is not ready [ 4516.512207] smsc911x smsc911x.0: eth0: SMSC911x/921x identified at 0xce006000, IRQ: 336 [ 4516.524658] ADDRCONF(NETDEV_UP): eth0: link is not ready [ 4516.559082] smsc911x smsc911x.0: eth0: SMSC911x/921x identified at 0xce006000, IRQ: 336 [ 4516.571990] ADDRCONF(NETDEV_UP): eth0: link is not ready ifconfig: SIOCSIFFLAGS: Input/output error The patch was reviewed by Steve Glendinning and Microchip Team. Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Acked-by: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 14:37:53 -05:00
Anish Bhatt	d7990b0c34	cxgb4i/cxgb4 : Refactor macros to conform to uniform standards Refactored all macros used in cxgb4i as part of previously started cxgb4 macro names cleanup. Makes them more uniform and avoids namespace collision. Minor changes in other drivers where required as some of these macros are used by multiple drivers, affected drivers are iw_cxgb4, cxgb4(vf) & csiostor Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 14:36:22 -05:00
Jason Wang	8c847d2541	tun: fix issues of iovec iterators using in tun_put_user() This patch fixes two issues after using iovec iterators: - vlan_offset should be initialized to zero, otherwise unexpected offset will be used in skb_copy_datagram_iter() - advance iovec iterator when vnet_hdr_sz is greater than sizeof(gso), this is the case when mergeable rx buffer were enabled for a virt guest. Fixes `e0b46d0ee9` ("tun: Use iovec iterators") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 14:33:22 -05:00
Thomas Graf	882288c05e	FOU: Fix no return statement warning for !CONFIG_NET_FOU_IP_TUNNELS net/ipv4/fou.c: In function ‘ip_tunnel_encap_del_fou_ops’: net/ipv4/fou.c:861:1: warning: no return statement in function returning non-void [-Wreturn-type] Fixes: `a8c5f90fb5` ("ip_tunnel: Ops registration for secondary encap (fou, gue)") Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 14:32:00 -05:00
Ilya Dryomov	cc9f1f518c	libceph: change from BUG to WARN for __remove_osd() asserts No reason to use BUG_ON for osd request list assertions. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-11-13 22:26:34 +03:00
Ilya Dryomov	ba9d114ec5	libceph: clear r_req_lru_item in __unregister_linger_request() kick_requests() can put linger requests on the notarget list. This means we need to clear the much-overloaded req->r_req_lru_item in __unregister_linger_request() as well, or we get an assertion failure in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item). AFAICT the assumption was that registered linger requests cannot be on any of req->r_req_lru_item lists, but that's clearly not the case. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-11-13 22:21:14 +03:00
Ilya Dryomov	a390de0208	libceph: unlink from o_linger_requests when clearing r_osd Requests have to be unlinked from both osd->o_requests (normal requests) and osd->o_linger_requests (linger requests) lists when clearing req->r_osd. Otherwise __unregister_linger_request() gets confused and we trip over a !list_empty(&osd->o_linger_requests) assert in __remove_osd(). MON=1 OSD=1: # cat remove-osd.sh #!/bin/bash rbd create --size 1 test DEV=$(rbd map test) ceph osd out 0 sleep 3 rbd map dne/dne # obtain a new osdmap as a side effect rbd unmap $DEV & # will block sleep 3 ceph osd in 0 Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-11-13 22:21:13 +03:00
Ilya Dryomov	aaef31703a	libceph: do not crash on large auth tickets Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth tickets will have their buffers vmalloc'ed, which leads to the following crash in crypto: [ 28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0 [ 28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80 [ 28.686032] PGD 0 [ 28.688088] Oops: 0000 [#1] PREEMPT SMP [ 28.688088] Modules linked in: [ 28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305 [ 28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 28.688088] Workqueue: ceph-msgr con_work [ 28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000 [ 28.688088] RIP: 0010:[<ffffffff81392b42>] [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80 [ 28.688088] RSP: 0018:ffff8800d903f688 EFLAGS: 00010286 [ 28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0 [ 28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750 [ 28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880 [ 28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010 [ 28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000 [ 28.688088] FS: 00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 28.688088] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0 [ 28.688088] Stack: [ 28.688088] ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32 [ 28.688088] ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020 [ 28.688088] ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010 [ 28.688088] Call Trace: [ 28.688088] [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40 [ 28.688088] [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40 [ 28.688088] [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220 [ 28.688088] [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180 [ 28.688088] [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30 [ 28.688088] [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0 [ 28.688088] [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0 [ 28.688088] [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60 [ 28.688088] [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0 [ 28.688088] [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360 [ 28.688088] [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0 [ 28.688088] [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80 [ 28.688088] [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0 [ 28.688088] [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80 [ 28.688088] [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0 [ 28.688088] [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0 [ 28.688088] [<ffffffff81559289>] try_read+0x1e59/0x1f10 This is because we set up crypto scatterlists as if all buffers were kmalloc'ed. Fix it. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-11-13 22:21:12 +03:00
Yan, Zheng	3231300bb9	ceph: fix flush tid comparision TID of cap flush ack is 64 bits, but ceph_inode_info::flushing_cap_tid is only 16 bits. 16 bits should be plenty to let the cap flush updates pipeline appropriately, but we need to cast in the proper direction when comparing these differently-sized versions. So downcast the 64-bits one to 16 bits. Reflects ceph.git commit a5184cf46a6e867287e24aeb731634828467cd98. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>	2014-11-13 22:19:05 +03:00
Linus Torvalds	3b98ec4eec	sound fixes for 3.18-rc5 Things get calming down, now we have only a few fix patches: a trivial fix for memory leak in usb-audio, a patch for the new HD-audio PCI id, a device-specific mute-LED fix, and a slightly big patch to cover the missing COEF inits of various Realtek codecs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJUZJv4AAoJEGwxgFQ9KSmkcmwP/0o2Z6B/N60dLHMW6kMQ1+KY Zr5dJzFMp195NALBjdDcQ6Q5ETJtkTc71wCUVcrfraO0yh9FzvnwN6fhfHI+/Mck h2SNqPfokpdgQgG6wN8RP6Lp/Fi34D3p2fcSK28YrXuU1AruHLp5TQ1qrO0WMLqV 1Qm0TOWJXaPdC+/Z8HWgzd9Qdb+Tig8/fM72cQrKrv5QdLEZ6+wMTT0mgYJ3sXMa PDlLp4VaO+eUMVwbg3TB0hi3S9KJoO5a0oO3R2GWm3USmGkoVUVyJ4RoWKOomha3 B873c4/zUXUjQdlloi5yNMcwTPy8fQpFXJDkUatporlKjzmQXCp7HPfYkg8y9U5J anD0Mk8vwchgjpqyIdHR/hf4CCN1fPq/TgWq//LRtELgaAA4zMKq0SCElUTsT+0l BF5filemUDWFMY86Urr4tB1lOj3qRfo9spiC8Tt64/tqHDoDbANA853wvBQbsxxO P3mvdDlz+WotNaEB5HdxxXdmrV6V8PnnsNF3Z3JZ/7M7sqssBaGwfnB3EJIY89eF ACAlaOnGoV1YJBuJ+ans2xhgGaJPbpAl7POI5/tB0xIVb6esBgeVNuCSRKLwPie4 HdJ0R7XobKJqX5A8e5L8SlJUSG5pebdCVwl97hXw18L14Ml5U0GiK+3c5uC4FzkV iUQ3EyijgsPqRKK3THSk =EhoH -----END PGP SIGNATURE----- Merge tag 'sound-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Things get calming down, now we have only a few fix patches: a trivial fix for memory leak in usb-audio, a patch for the new HD-audio PCI id, a device-specific mute-LED fix, and a slightly big patch to cover the missing COEF inits of various Realtek codecs" * tag 'sound-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Add mute LED control for Lenovo Ideapad Z560 ALSA: hda/realtek - Change EAPD to verb control ALSA: usb-audio: Fix memory leak in FTU quirk ALSA: hda_intel: Add DeviceIDs for Sunrise Point-LP	2014-11-13 09:57:04 -08:00
Linus Torvalds	2c54396e40	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull SELinux fixlet from James Morris: "WARN_ONCE() here will unnecessarily terrify users" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: selinux: convert WARN_ONCE() to printk() in selinux_nlmsg_perm()	2014-11-13 09:46:15 -08:00
Linus Torvalds	911883759f	Merge branch 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit Pull audit fixes from Paul Moore: "After he sent the initial audit pull request for 3.18, Eric asked me to take over the management of the audit tree, hence this pull request to fix a couple of problems with audit. As you can see below, the changes are minimal: adding some whitespace to a string so userspace parses it correctly, and fixing a problem with audit's usage of fsnotify that was causing audit watch rules to be lost. Neither of these patches were very controversial on the mailing lists and they fix real problems, getting them into 3.18 would be a good thing" * 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit: audit: keep inode pinned audit: AUDIT_FEATURE_CHANGE message format missing delimiting space	2014-11-13 09:36:39 -08:00
Linus Torvalds	5a7a662cc6	. stable fix for dm-thin that avoids normal IO racing with discard . stable fix for a dm-cache related bug in dm-btree walking code that results from using very large fast device (e.g. 4T) with a very small cache blocksize (e.g. 32K) -- this is a very uncommon configuration . a couple fixes for dm-raid (one for stable and the other addresses a crash in 3.18-rc1 code) . stable fix for dm-thinp that addresses a very rare dm-bufio bug having to do with memory reclaimation (via shrinker) when using dm-thinp ontop of loopback devices . fix a leak in dm-stripe target constructor's error path -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUY/p7AAoJEMUj8QotnQNaxPEIAJsmJC5ujQAIdm5yUxsOWruU Y/36HbPvlmV8fgWqGyjaubBrzqgWry/yW/u/Sv9+9rE3Zh6JSVLVrCA6uZZ3Yr+j HKYEPjm/O0zVJepfEDKtjG6dxeaql47+luwU1iP1bAYeZE3zmKn1oFT2GW5gTbxO 2n3MiN/dyX8v0cTw6r0O69luIAu93CSY0XDk+1ynfKlKKVmgcAUPvKuobF+yHXoF Rd7KTqFoK6HgRhdUHvUQnCGDandZ9MHjt3oW9p3dv3ezvW1cNUARoVHMRGG6Awfu WZkQ/VORDeaJT+bhjGfPIla1HbgxEKJrgzTUlpj+P6K2uPK2f6ECEyBpDLWKy9g= =lkSu -----END PGP SIGNATURE----- Merge tag 'dm-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - stable fix for dm-thin that avoids normal IO racing with discard - stable fix for a dm-cache related bug in dm-btree walking code that results from using very large fast device (eg 4T) with a very small cache blocksize (eg 32K) -- this is a very uncommon configuration - a couple fixes for dm-raid (one for stable and the other addresses a crash in 3.18-rc1 code) - stable fix for dm-thinp that addresses a very rare dm-bufio bug having to do with memory reclaimation (via shrinker) when using dm-thinp ontop of loopback devices - fix a leak in dm-stripe target constructor's error path * tag 'dm-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm btree: fix a recursion depth bug in btree walking code dm thin: grab a virtual cell before looking up the mapping dm raid: fix inaccessible superblocks causing oops in configure_discard_support dm raid: ensure superblock's size matches device's logical block size dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks dm stripe: fix potential for leak in stripe_ctr error path	2014-11-13 09:19:20 -08:00
James Morris	09c6268927	Merge branch 'stable-3.18' of git://git.infradead.org/users/pcmoore/selinux into for-linus	2014-11-13 21:49:53 +11:00

1 2 3 4 5 ...

482541 Commits