linux

Author	SHA1	Message	Date
Juliet Kim	7d7195a026	ibmvnic: Do not process device remove during device reset The ibmvnic driver does not check the device state when the device is removed. If the device is removed while a device reset is being processed, the remove may free structures needed by the reset, causing an oops. Fix this by checking the device state before processing device remove. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-10 15:50:42 -07:00
Karsten Graul	ece0d7bd74	net/smc: cancel event worker during device removal During IB device removal, cancel the event worker before the device structure is freed. Fixes: `a4cf0443c4` ("smc: introduce SMC as an IB-client") Reported-by: syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-10 15:40:33 -07:00
Hangbin Liu	60380488e4	ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface Rafał found an issue that for non-Ethernet interface, if we down and up frequently, the memory will be consumed slowly. The reason is we add allnodes/allrouters addressed in multicast list in ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up() for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb getting bigger and bigger. The call stack looks like: addrconf_notify(NETDEV_REGISTER) ipv6_add_dev ipv6_dev_mc_inc(ff01::1) ipv6_dev_mc_inc(ff02::1) ipv6_dev_mc_inc(ff02::2) addrconf_notify(NETDEV_UP) addrconf_dev_config /* Alas, we support only Ethernet autoconfiguration. */ return; addrconf_notify(NETDEV_DOWN) addrconf_ifdown ipv6_mc_down igmp6_group_dropped(ff02::2) mld_add_delrec(ff02::2) igmp6_group_dropped(ff02::1) igmp6_group_dropped(ff01::1) After investigating, I can't found a rule to disable multicast on non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM, tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up() in inetdev_event(). Even for IPv6, we don't check the dev type and call ipv6_add_dev(), ipv6_dev_mc_inc() after register device. So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for non-Ethernet interface. v2: Also check IFF_MULTICAST flag to make sure the interface supports multicast Reported-by: Rafał Miłecki <zajec5@gmail.com> Tested-by: Rafał Miłecki <zajec5@gmail.com> Fixes: `74235a25c6` ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels") Fixes: `1666d49e1d` ("mld: do not remove mld souce list info when set link down") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-10 15:37:49 -07:00
Linus Torvalds	f35111a946	Another update for the .clang-format macro list It has been a while since the last time I sent one! -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl5nujkACgkQGXyLc2ht IW3cUBAAtLrfdsFPr/uLeFpHFIPm+X3ZAq/nOQBdrerR4GJfMD/zDN5FjKiXlsfo KWIk/kG0nyqEnRrwX/oh8XqGdx3kgGuniZY9Sjo9XOMggS4pdYZBYo/6LVnJ1Mgo 2AOb9z+5wqOvsOLeIH46M0avOk2V2+hdlClx/ns6WeQnTipqbMzgWAfdnsTWk3i7 ApK4Sc/1yQPKz1Nhr/REcADU38x2fyfHj1r2EvCm2E3ljC/che5LkvCmZHr5Iqud iC/BooxdeCsFfN4mvl17IBsK1iveeRt+ZFW4NsijiMNBYrfUQ74PxEre+iCn1uk3 +KLsBnYVsqkrczyGPHjtZJnonEOjte7BQp076vA2QcTRDS3akAfdm94QsUwHJq39 iJVo8LHJhLPoIgfdhml3RFDLSA5FB7O5feC6gDNgPnq7X9FqNeNvJRJzqYTe3TtE UqXpbbqFf4I+1GOBOmOSOL8eSBfnJVoTOwVJsvcMF1SBfwpyYyQApOXsIQjKcZH+ KGIoUTd6bS6xc4FMx/6/O2Z7aWUaH72X1jADo0uP947BhQGQDcnRmH/cu+IqXceB sHs6a2a3wuJ9iqDb3fT1tEF/k2n9X5y9PPYqRynD1320E9OWihavG89+3m5BrfJd NFSxp8Buw72hjbTJDRv79RtVhZcYdUOAauO26IX+3K5X9ThZfrk= =t4To -----END PGP SIGNATURE----- Merge tag 'clang-format-for-linus-v5.6-rc6' of git://github.com/ojeda/linux Pull clang-format update from Miguel Ojeda: "Another update for the .clang-format macro list It has been a while since the last time I sent one!" * tag 'clang-format-for-linus-v5.6-rc6' of git://github.com/ojeda/linux: clang-format: Update with the latest for_each macro list	2020-03-10 15:36:27 -07:00
Shakeel Butt	d752a49865	net: memcg: late association of sock to memcg If a TCP socket is allocated in IRQ context or cloned from unassociated (i.e. not associated to a memcg) in IRQ context then it will remain unassociated for its whole life. Almost half of the TCPs created on the system are created in IRQ context, so, memory used by such sockets will not be accounted by the memcg. This issue is more widespread in cgroup v1 where network memory accounting is opt-in but it can happen in cgroup v2 if the source socket for the cloning was created in root memcg. To fix the issue, just do the association of the sockets at the accept() time in the process context and then force charge the memory buffer already used and reserved by the socket. Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-10 15:33:05 -07:00
Shakeel Butt	e876ecc67d	cgroup: memcg: net: do not associate sock with unrelated cgroup We are testing network memory accounting in our setup and noticed inconsistent network memory usage and often unrelated cgroups network usage correlates with testing workload. On further inspection, it seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in irq context specially for cgroup v1. mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context and kind of assumes that this can only happen from sk_clone_lock() and the source sock object has already associated cgroup. However in cgroup v1, where network memory accounting is opt-in, the source sock can be unassociated with any cgroup and the new cloned sock can get associated with unrelated interrupted cgroup. Cgroup v2 can also suffer if the source sock object was created by process in the root cgroup or if sk_alloc() is called in irq context. The fix is to just do nothing in interrupt. WARNING: Please note that about half of the TCP sockets are allocated from the IRQ context, so, memory used by such sockets will not be accouted by the memcg. The stack trace of mem_cgroup_sk_alloc() from IRQ-context: CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1 Hardware name: ... Call Trace: <IRQ> dump_stack+0x57/0x75 mem_cgroup_sk_alloc+0xe9/0xf0 sk_clone_lock+0x2a7/0x420 inet_csk_clone_lock+0x1b/0x110 tcp_create_openreq_child+0x23/0x3b0 tcp_v6_syn_recv_sock+0x88/0x730 tcp_check_req+0x429/0x560 tcp_v6_rcv+0x72d/0xa40 ip6_protocol_deliver_rcu+0xc9/0x400 ip6_input+0x44/0xd0 ? ip6_protocol_deliver_rcu+0x400/0x400 ip6_rcv_finish+0x71/0x80 ipv6_rcv+0x5b/0xe0 ? ip6_sublist_rcv+0x2e0/0x2e0 process_backlog+0x108/0x1e0 net_rx_action+0x26b/0x460 __do_softirq+0x104/0x2a6 do_softirq_own_stack+0x2a/0x40 </IRQ> do_softirq.part.19+0x40/0x50 __local_bh_enable_ip+0x51/0x60 ip6_finish_output2+0x23d/0x520 ? ip6table_mangle_hook+0x55/0x160 __ip6_finish_output+0xa1/0x100 ip6_finish_output+0x30/0xd0 ip6_output+0x73/0x120 ? __ip6_finish_output+0x100/0x100 ip6_xmit+0x2e3/0x600 ? ipv6_anycast_cleanup+0x50/0x50 ? inet6_csk_route_socket+0x136/0x1e0 ? skb_free_head+0x1e/0x30 inet6_csk_xmit+0x95/0xf0 __tcp_transmit_skb+0x5b4/0xb20 __tcp_send_ack.part.60+0xa3/0x110 tcp_send_ack+0x1d/0x20 tcp_rcv_state_process+0xe64/0xe80 ? tcp_v6_connect+0x5d1/0x5f0 tcp_v6_do_rcv+0x1b1/0x3f0 ? tcp_v6_do_rcv+0x1b1/0x3f0 __release_sock+0x7f/0xd0 release_sock+0x30/0xa0 __inet_stream_connect+0x1c3/0x3b0 ? prepare_to_wait+0xb0/0xb0 inet_stream_connect+0x3b/0x60 __sys_connect+0x101/0x120 ? __sys_getsockopt+0x11b/0x140 __x64_sys_connect+0x1a/0x20 do_syscall_64+0x51/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The stack trace of mem_cgroup_sk_alloc() from IRQ-context: Fixes: `2d75807383` ("mm: memcontrol: consolidate cgroup socket tracking") Fixes: `d979a39d72` ("cgroup: duplicate cgroup reference when cloning sockets") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-10 15:33:05 -07:00
Linus Torvalds	2a48b37931	A few minor auxdisplay improvements: - charlcd: replace zero-length array with flexible-array member From the kernel-wide cleanup Gustavo A. R. Silva is performing - img-ascii-lcd: convert to devm_platform_ioremap_resource From Yangtao Li - Fix Kconfig indentation From Krzysztof Kozlowski -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl5nuYwACgkQGXyLc2ht IW1bJQ//SghpbB2ArCM7/D2lblUs2DWdk3FvEpfQ4RmowFAM7bWZXH9w8Wfym+33 dUUXMK3mTi7FFFFsYuX1hEP3KFctfP5M262pmXX/fNRZj36g3zwIr63Vpx3l3cdg 5WwSxHUpJ8VpXfq+/wqNGNV+KenyIQDOyHefkf9zrhs/C5Da2YMwRTNK8WOITrgm W/RBeCBYsTRbXKJ9REXFbUFq0AawV1Fd6OzDTxbPmywRbtmGD9tyelcHPQSxasI7 TMGJ7g2JVfdcM8jJFt7EKsehPgv2/2+oj5OxhY1m8L/tiIyIL7qwqMX5iG01YOkF pFEvTEGrSWSIruD+tMeV0nlVqBW0ZSwOFv3DkO5g3Yfn5SEDV3jRrcsw/BpcXOcH Tm1D3/AGT9mRtN1+gt3JnobGm0XJadu5u/dWngrM7KF9bua+dCMAlmIBugblwvwJ bTkMleiLHk3rA3AJLLreywbYNW/9qejbQ5EODHr7UrPSA0LPTWNPWD1fBsMKt/hw MXUVWMfLuo6sG9Tz9i3Suru1thrVzFHRVYioxfjUL6VhllyK//qzPIHk/oRKaiIo 1hDs39fxptRgFggzIaKTuBvra9Uf+UOn/7fUIU3iNWm6OlXprGAOPMWyOIw/eC0s lxsYDzyOy7KCCFOU7cITAeU3eZhHzsCMR8dWGGkz8+KHG5WfnjI= =63zv -----END PGP SIGNATURE----- Merge tag 'auxdisplay-for-linus-v5.6-rc6' of git://github.com/ojeda/linux Pull auxdisplay updates from Miguel Ojeda: "A few minor auxdisplay improvements: - charlcd: replace zero-length array with flexible-array member (kernel-wide cleanup by Gustavo A. R. Silva) - img-ascii-lcd: convert to devm_platform_ioremap_resource (Yangtao Li) - Fix Kconfig indentation (Krzysztof Kozlowski) * tag 'auxdisplay-for-linus-v5.6-rc6' of git://github.com/ojeda/linux: auxdisplay: charlcd: replace zero-length array with flexible-array member auxdisplay: img-ascii-lcd: convert to devm_platform_ioremap_resource auxdisplay: Fix Kconfig indentation	2020-03-10 15:32:57 -07:00
Jakub Kicinski	65dfcf0807	MAINTAINERS: update cxgb4vf maintainer to Vishal Casey Leedomn <leedom@chelsio.com> is bouncing, Vishal indicated he's happy to take the role. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-10 15:29:47 -07:00
Linus Torvalds	e941484541	Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - cgroup.procs listing related fixes. It didn't interlock properly with exiting tasks leaving a short window where a cgroup has empty cgroup.procs but still can't be removed and misbehaved on short reads. - psi_show() crash fix on 32bit ino archs - Empty release_agent handling fix * 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup1: don't call release_agent when it is "" cgroup: fix psi_show() crash on 32bit ino archs cgroup: Iterate tasks that did not finish do_exit() cgroup: cgroup_procs_next should increase position index cgroup-v1: cgroup_pidlist_next should update position index	2020-03-10 15:05:45 -07:00
Linus Torvalds	2c1aca4bd3	Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Workqueue has been incorrectly round-robining per-cpu work items. Hillf's patch fixes that. The other patch documents memory-ordering properties of workqueue operations" * 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: don't use wq_select_unbound_cpu() for bound works workqueue: Document (some) memory-ordering properties of {queue,schedule}_work()	2020-03-10 14:48:22 -07:00
Hersen Wu	1d2686d417	drm/amdgpu/powerplay: nv1x, renior copy dcn clock settings of watermark to smu during boot up dc to pplib interface is changed for navi1x, renoir. display_config_changed is not called by dc anymore. smu_write_watermarks_table is not executed for navi1x, renoir during boot up. solution: call smu_write_watermarks_table just after dc pass watermark clock settings to pplib Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-03-10 17:31:10 -04:00
Kan Liang	b95fcd2c1c	perf vendor events intel: Add NO_NMI_WATCHDOG metric constraint Add NO_NMI_WATCHDOG metric constraint to Page_Walks_Utilization for Sky Lake and Cascade Lake. Committer testing: On a Lenovo T480S, Intel(R) Core(TM) i7-8650U Kaby Lake, that looking at x86's mapfile.csv file is a: $ grep -w skylake tools/perf/pmu-events/arch/x86/mapfile.csv GenuineIntel-6-[4589]E,v24,skylake,core $ So uses the constraint added in this patch in this file: tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json Before: # perf stat -a -M Page_Walks_Utilization sleep 2 Performance counter stats for 'system wide': <not counted> itlb_misses.walk_pending (0.00%) <not counted> dtlb_load_misses.walk_pending (0.00%) <not counted> dtlb_store_misses.walk_pending (0.00%) <not counted> ept.walk_pending (0.00%) <not counted> cycles (0.00%) 2.001750514 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog The events in group usually have to be from the same PMU. Try reorganizing the group. # After: # perf stat -a -M Page_Walks_Utilization sleep 2 Splitting metric group Page_Walks_Utilization into standalone metrics. Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog , Performance counter stats for 'system wide': 36,883,102 itlb_misses.walk_pending # 0.1 Page_Walks_Utilization (79.99%) 123,104,146 dtlb_load_misses.walk_pending (80.02%) 13,720,795 dtlb_store_misses.walk_pending (79.99%) 0 ept.walk_pending (79.99%) 1,519,948,400 cycles (80.01%) 2.002170780 seconds time elapsed # Before and after, if we disable the nmi_watchdog we get: # echo 0 > /proc/sys/kernel/nmi_watchdog # perf stat -a -M Page_Walks_Utilization sleep 2 Performance counter stats for 'system wide': 33,721,658 itlb_misses.walk_pending # 0.1 Page_Walks_Utilization 84,070,996 dtlb_load_misses.walk_pending 9,816,071 dtlb_store_misses.walk_pending 0 ept.walk_pending 704,920,899 cycles 2.002331670 seconds time elapsed # More information about the metric expressions: # perf stat -v -a -M Page_Walks_Utilization sleep 2 Using CPUID GenuineIntel-6-8E-A metric expr ( itlb_misses.walk_pending + dtlb_load_misses.walk_pending + dtlb_store_misses.walk_pending + ept.walk_pending ) / ( 2 * cycles ) for Page_Walks_Utilization found event itlb_misses.walk_pending found event dtlb_load_misses.walk_pending found event dtlb_store_misses.walk_pending found event ept.walk_pending found event cycles adding {itlb_misses.walk_pending,dtlb_load_misses.walk_pending,dtlb_store_misses.walk_pending,ept.walk_pending,cycles}:W -> cpu/umask=0x10,(null)=0x186a3,event=0x85/ -> cpu/umask=0x10,(null)=0x1e8483,event=0x8/ -> cpu/umask=0x10,(null)=0x1e8483,event=0x49/ -> cpu/umask=0x10,(null)=0x1e8483,event=0x4f/ itlb_misses.walk_pending: 8085772 16010162799 16010162799 dtlb_load_misses.walk_pending: 28134579 16010162799 16010162799 dtlb_store_misses.walk_pending: 7276535 16010162799 16010162799 ept.walk_pending: 2 16010162799 16010162799 cycles: 315140605 16010162799 16010162799 Performance counter stats for 'system wide': 8,085,772 itlb_misses.walk_pending # 0.1 Page_Walks_Utilization 28,134,579 dtlb_load_misses.walk_pending 7,276,535 dtlb_store_misses.walk_pending 2 ept.walk_pending 315,140,605 cycles 2.002333181 seconds time elapsed # Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-6-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-10 14:56:46 -03:00
Kan Liang	ab483d8bc8	perf metricgroup: Support metric constraint Some metric groups have metric constraints. A metric group can be scheduled as a group only when some constraints are applied. For example, Page_Walks_Utilization has a metric constraint, "NO_NMI_WATCHDOG". When NMI watchdog is disabled, the metric group can be scheduled as a group. Otherwise, splitting the metric group into standalone metrics. Add a new function, metricgroup__has_constraint(), to check whether all constraints are applied. If not, splitting the metric group into standalone metrics. Currently, only one constraint, "NO_NMI_WATCHDOG", is checked. Print a warning for the metric group with the constraint, when NMI WATCHDOG is enabled. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-5-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-10 14:47:50 -03:00
Kan Liang	2a14c1bf01	perf util: Factor out sysctl__nmi_watchdog_enabled() The NMI watchdog status is required for metric group constraint examination. Factor out sysctl__nmi_watchdog_enabled() to retrieve the NMI watchdog status. Users may count more than one metric group each time. If so, the NMI watchdog status may be retrieved several times. To reduce the overhead, cache the NMI watchdog status. Replace the NMI watchdog status checking in print_footer() by sysctl__nmi_watchdog_enabled(). Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-10 14:46:19 -03:00
Kan Liang	f742634ab4	perf metricgroup: Factor out metricgroup__add_metric_weak_group() Factor out metricgroup__add_metric_weak_group() which add metrics into a weak group. The change can improve code readability. Because following patch will introduce a function which add standalone metrics. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-10 14:44:36 -03:00
Kan Liang	03fe02b113	perf jevents: Support metric constraint A new field "MetricConstraint" is introduced in JSON event list. Extend jevents to parse the field and save the value in metric_constraint. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/1582581564-184429-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-10 14:43:05 -03:00
Tejun Heo	dcd6589b11	blk-iocost: fix incorrect vtime comparison in iocg_is_idle() vtimes may wrap and time_before/after64() should be used to determine whether a given vtime is before or after another. iocg_is_idle() was incorrectly using plain "<" comparison do determine whether done_vtime is before vtime. Here, the only thing we're interested in is whether done_vtime matches vtime which indicates that there's nothing in flight. Let's test for inequality instead. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `7caa47151a` ("blkcg: implement blk-iocost") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 11:37:00 -06:00
Thomas Richter	e7950166e4	perf vendor events s390: Add new deflate counters for IBM z15 Add support for new deflate counters: - Counter 247: cycles CPU spent obtaining access to Deflate unit - Counter 252: cycles CPU is using Deflate unit - Counter 264: Increments by one for every DEFLATE CONVERSION CALL instruction executed. - Counter 265: Increments by one for every DEFLATE CONVERSION CALL instruction executed that ended in Condition Codes 0, 1 or 2. Also adjust the some crypto counter description to latest documentation. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200310142937.32045-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-10 11:40:21 -03:00
Hillf Danton	aa202f1f56	workqueue: don't use wq_select_unbound_cpu() for bound works wq_select_unbound_cpu() is designed for unbound workqueues only, but it's wrongly called when using a bound workqueue too. Fixing this ensures work queued to a bound workqueue with cpu=WORK_CPU_UNBOUND always runs on the local CPU. Before, that would happen only if wq_unbound_cpumask happened to include it (likely almost always the case), or was empty, or we got lucky with forced round-robin placement. So restricting /sys/devices/virtual/workqueue/cpumask to a small subset of a machine's CPUs would cause some bound work items to run unexpectedly there. Fixes: `ef55718044` ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Hillf Danton <hdanton@sina.com> [dj: massage changelog] Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>	2020-03-10 10:30:51 -04:00
Hamish Martin	3747cd2efe	i2c: gpio: suppress error on probe defer If a GPIO we are trying to use is not available and we are deferring the probe, don't output an error message. This seems to have been the intent of commit `05c7477885` ("i2c: gpio: Add support for named gpios in DT") but the error was still output due to not checking the updated 'retdesc'. Fixes: `05c7477885` ("i2c: gpio: Add support for named gpios in DT") Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-03-10 12:31:55 +01:00
Wolfram Sang	bcf3588d8e	macintosh: windfarm: fix MODINFO regression Commit `af503716ac` made sure OF devices get an OF style modalias with I2C events. It assumed all in-tree users were converted, yet it missed some Macintosh drivers. Add an OF module device table for all windfarm drivers to make them automatically load again. Fixes: `af503716ac` ("i2c: core: report OF style module alias for devices registered via OF") Link: https://bugzilla.kernel.org/show_bug.cgi?id=199471 Reported-by: Erhard Furtner <erhard_f@mailbox.org> Tested-by: Erhard Furtner <erhard_f@mailbox.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org # v4.17+	2020-03-10 12:30:59 +01:00
Jarkko Nikula	9be8bc4dd6	i2c: designware-pci: Fix BUG_ON during device removal Function i2c_dw_pci_remove() -> pci_free_irq_vectors() -> pci_disable_msi() -> free_msi_irqs() will throw a BUG_ON() for MSI enabled device since the driver has not released the requested IRQ before calling the pci_free_irq_vectors(). Here driver requests an IRQ using devm_request_irq() but automatic release happens only after remove callback. Fix this by explicitly freeing the IRQ before calling pci_free_irq_vectors(). Fixes: `21aa3983d6` ("i2c: designware-pci: Switch over to MSI interrupts") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-03-10 10:33:52 +01:00
Qian Cai	f515241652	iommu/vt-d: Silence RCU-list debugging warnings Similar to the commit `02d715b4a8` ("iommu/vt-d: Fix RCU list debugging warnings"), there are several other places that call list_for_each_entry_rcu() outside of an RCU read side critical section but with dmar_global_lock held. Silence those false positives as well. drivers/iommu/intel-iommu.c:4288 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x1ad/0xb97 drivers/iommu/dmar.c:366 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x125/0xb97 drivers/iommu/intel-iommu.c:5057 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffffa71892c8 (dmar_global_lock){++++}, at: intel_iommu_init+0x61a/0xb13 Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-03-10 10:30:52 +01:00
Qian Cai	2d48ea0efb	iommu/vt-d: Fix RCU-list bugs in intel_iommu_init() There are several places traverse RCU-list without holding any lock in intel_iommu_init(). Fix them by acquiring dmar_global_lock. WARNING: suspicious RCU usage ----------------------------- drivers/iommu/intel-iommu.c:5216 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by swapper/0/1. Call Trace: dump_stack+0xa0/0xea lockdep_rcu_suspicious+0x102/0x10b intel_iommu_init+0x947/0xb13 pci_iommu_init+0x26/0x62 do_one_initcall+0xfe/0x500 kernel_init_freeable+0x45a/0x4f8 kernel_init+0x11/0x139 ret_from_fork+0x3a/0x50 DMAR: Intel(R) Virtualization Technology for Directed I/O Fixes: `d8190dc638` ("iommu/vt-d: Enable DMA remapping after rmrr mapped") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-03-10 10:30:33 +01:00
Mika Westerberg	04bbb97d1b	i2c: i801: Do not add ICH_RES_IO_SMI for the iTCO_wdt device Martin noticed that nct6775 driver does not load properly on his system in v5.4+ kernels. The issue was bisected to commit `b84398d6d7` ("i2c: i801: Use iTCO version 6 in Cannon Lake PCH and beyond") but it is likely not the culprit because the faulty code has been in the driver already since commit `9424693035` ("i2c: i801: Create iTCO device on newer Intel PCHs"). So more likely some commit that added PCI IDs of recent chipsets made the driver to create the iTCO_wdt device on Martins system. The issue was debugged to be PCI configuration access to the PMC device that is not present. This returns all 1's when read and this caused the iTCO_wdt driver to accidentally request resourses used by nct6775. It turns out that the SMI resource is only required for some ancient systems, not the ones supported by this driver. For this reason do not populate the SMI resource at all and drop all the related code. The driver now always populates the main I/O resource and only in case of SPT (Intel Sunrisepoint) compatible devices it adds another resource for the NO_REBOOT bit. These two resources are of different types so platform_get_resource() used by the iTCO_wdt driver continues to find the both resources at index 0. Link: https://lore.kernel.org/linux-hwmon/CAM1AHpQ4196tyD=HhBu-2donSsuogabkfP03v1YF26Q7_BgvgA@mail.gmail.com/ Fixes: `9424693035` ("i2c: i801: Create iTCO device on newer Intel PCHs") [wsa: complete fix needs all of http://patchwork.ozlabs.org/project/linux-i2c/list/?series=160959&state=*] Reported-by: Martin Volf <martin.volf.42@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-03-10 10:27:11 +01:00
Mika Westerberg	e42b0c2438	watchdog: iTCO_wdt: Make ICH_RES_IO_SMI optional The iTCO_wdt driver only needs ICH_RES_IO_SMI I/O resource when either turn_SMI_watchdog_clear_off module parameter is set to match ->iTCO_version (or higher), and when legacy iTCO_vendorsupport is set. Modify the driver so that ICH_RES_IO_SMI is optional if the two conditions are not met. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-03-10 10:20:37 +01:00
Mika Westerberg	7ca6ee3890	watchdog: iTCO_wdt: Export vendorsupport In preparation for making ->smi_res optional the iTCO_wdt driver needs to know whether vendorsupport is being set to non-zero. For this reason export the variable. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-03-10 10:20:27 +01:00
Jani Nikula	b74f241d71	Merge tag 'gvt-fixes-2020-03-10' of https://github.com/intel/gvt-linux into drm-intel-fixes gvt-fixes-2020-03-10 - Fix vgpu idr destroy causing timer destroy failure (Zhenyu) - Fix VBT size (Tina) Signed-off-by: Jani Nikula <jani.nikula@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200310080933.GE28483@zhen-hp.sh.intel.com	2020-03-10 11:16:43 +02:00
Rafael J. Wysocki	bce74b1feb	linux-cpupower-5.6-rc6 This cpupower update for Linux 5.6-rc6 consists of a fix from Mike Gilbert for build failures when -fno-common is enabled. -fno-common will be default in gcc v10. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl5i5ZAACgkQCwJExA0N Qxxq7g//aogwkyv4OQlAZgObu1NX9V/8U7dPSkzIGk8Npn2MTvtHzfNxGJ92CxtL PmzRVlpWwGWZqrTWPerIGimZ2agKrxaHkMvdngu5eoZb746TXDPN2sjHVpmzF27o ph0OFyjtocNzBhw66hhevVFhxJ5Y18k0s5KqjG50fLmWJIfAOT0wGtnlk99I5qye 3AbqElm+dEt+zAWkab6cPOqsl4t3AXWq4DRMH6ocEY+RC5HgBpT931D1bYzHRDhl sYGY2FNtJmIXInpH2Sl9TH5QeeOv29+V2bqHvcym+B9I3O3nTd2DU1/buMQudtAm ec8teG2jLXeXG5ueL0pA3gHgTkibckgmNFy+ULsH/qkQKPURQYLizI5Qa00kYzCD HznqaeZSJLcMFXSMMbpx9WJZ5yMBgQTKKdWzHAS+nuNMAWDpR0i3rDRUFkE0IiQ9 v2AW0i9s4Vo8o7ZKn5wMUVbK5wOpNh7iM8ujXSp5rxVSBDuw2oMCWnYyLVkynSiy 4WdS441W2mxktzoG9e0xsAd0ww3mxU65jp5Q1gQXJYrLAr7FzyAvXpCSd0M4yxni 27FQxHvS9iVl1POVM8ZegDA/czwOuLFFVM6Gc1sPeY5QN0cFQ0FLuSP2lfL6oh2z SO4aCJRPH33hA8oFrvTJwhZ+hOr5Ev9L5NWsNUH4jU2C28FmVvs= =WEti -----END PGP SIGNATURE----- Merge tag 'linux-cpupower-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull cpupower utility fix for v5.6 from Shuah Khan: "This cpupower update for Linux 5.6-rc6 consists of a fix from Mike Gilbert for build failures when -fno-common is enabled. -fno-common will be default in gcc v10." * tag 'linux-cpupower-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: cpupower: avoid multiple definition with gcc -fno-common	2020-03-10 09:52:04 +01:00
Marek Szyprowski	07dc3678ba	drm/exynos: Fix cleanup of IOMMU related objects Store the IOMMU mapping created by the device core of each Exynos DRM sub-device and restore it when the Exynos DRM driver is unbound. This fixes IOMMU initialization failure for the second time when a deferred probe is triggered from the bind() callback of master's compound DRM driver. This also fixes the following issue found using kmemleak detector: unreferenced object 0xc2137640 (size 64): comm "swapper/0", pid 1, jiffies 4294937900 (age 3127.400s) hex dump (first 32 bytes): 50 a3 14 c2 80 a2 14 c2 01 00 00 00 20 00 00 00 P........... ... 00 10 00 00 00 80 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<3acd268d>] arch_setup_dma_ops+0x4c/0x104 [<9f7d2cce>] of_dma_configure+0x19c/0x3a4 [<ba07704b>] really_probe+0xb0/0x47c [<4f510e4f>] driver_probe_device+0x78/0x1c4 [<7481a0cf>] device_driver_attach+0x58/0x60 [<0ff8f5c1>] __driver_attach+0xb8/0x158 [<86006144>] bus_for_each_dev+0x74/0xb4 [<10159dca>] bus_add_driver+0x1c0/0x200 [<8a265265>] driver_register+0x74/0x108 [<e0f3451a>] exynos_drm_init+0xb0/0x134 [<db3fc7ba>] do_one_initcall+0x90/0x458 [<6da35917>] kernel_init_freeable+0x188/0x200 [<db3f74d4>] kernel_init+0x8/0x110 [<1f3cddf9>] ret_from_fork+0x14/0x20 [<8cd12507>] 0x0 unreferenced object 0xc214a280 (size 128): comm "swapper/0", pid 1, jiffies 4294937900 (age 3127.400s) hex dump (first 32 bytes): 00 a0 ec ed 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<3acd268d>] arch_setup_dma_ops+0x4c/0x104 [<9f7d2cce>] of_dma_configure+0x19c/0x3a4 [<ba07704b>] really_probe+0xb0/0x47c [<4f510e4f>] driver_probe_device+0x78/0x1c4 [<7481a0cf>] device_driver_attach+0x58/0x60 [<0ff8f5c1>] __driver_attach+0xb8/0x158 [<86006144>] bus_for_each_dev+0x74/0xb4 [<10159dca>] bus_add_driver+0x1c0/0x200 [<8a265265>] driver_register+0x74/0x108 [<e0f3451a>] exynos_drm_init+0xb0/0x134 [<db3fc7ba>] do_one_initcall+0x90/0x458 [<6da35917>] kernel_init_freeable+0x188/0x200 [<db3f74d4>] kernel_init+0x8/0x110 [<1f3cddf9>] ret_from_fork+0x14/0x20 [<8cd12507>] 0x0 unreferenced object 0xedeca000 (size 4096): comm "swapper/0", pid 1, jiffies 4294937900 (age 3127.400s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<3acd268d>] arch_setup_dma_ops+0x4c/0x104 [<9f7d2cce>] of_dma_configure+0x19c/0x3a4 [<ba07704b>] really_probe+0xb0/0x47c [<4f510e4f>] driver_probe_device+0x78/0x1c4 [<7481a0cf>] device_driver_attach+0x58/0x60 [<0ff8f5c1>] __driver_attach+0xb8/0x158 [<86006144>] bus_for_each_dev+0x74/0xb4 [<10159dca>] bus_add_driver+0x1c0/0x200 [<8a265265>] driver_register+0x74/0x108 [<e0f3451a>] exynos_drm_init+0xb0/0x134 [<db3fc7ba>] do_one_initcall+0x90/0x458 [<6da35917>] kernel_init_freeable+0x188/0x200 [<db3f74d4>] kernel_init+0x8/0x110 [<1f3cddf9>] ret_from_fork+0x14/0x20 [<8cd12507>] 0x0 unreferenced object 0xc214a300 (size 128): comm "swapper/0", pid 1, jiffies 4294937900 (age 3127.400s) hex dump (first 32 bytes): 00 a3 14 c2 00 a3 14 c2 00 40 18 c2 00 80 18 c2 .........@...... 02 00 02 00 ad 4e ad de ff ff ff ff ff ff ff ff .....N.......... backtrace: [<08cbd8bc>] iommu_domain_alloc+0x24/0x50 [<b835abee>] arm_iommu_create_mapping+0xe4/0x134 [<3acd268d>] arch_setup_dma_ops+0x4c/0x104 [<9f7d2cce>] of_dma_configure+0x19c/0x3a4 [<ba07704b>] really_probe+0xb0/0x47c [<4f510e4f>] driver_probe_device+0x78/0x1c4 [<7481a0cf>] device_driver_attach+0x58/0x60 [<0ff8f5c1>] __driver_attach+0xb8/0x158 [<86006144>] bus_for_each_dev+0x74/0xb4 [<10159dca>] bus_add_driver+0x1c0/0x200 [<8a265265>] driver_register+0x74/0x108 [<e0f3451a>] exynos_drm_init+0xb0/0x134 [<db3fc7ba>] do_one_initcall+0x90/0x458 [<6da35917>] kernel_init_freeable+0x188/0x200 [<db3f74d4>] kernel_init+0x8/0x110 [<1f3cddf9>] ret_from_fork+0x14/0x20 Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>	2020-03-10 13:25:18 +09:00
David S. Miller	2362059427	Here is a batman-adv bugfix: - Don't schedule OGM for disabled interface, by Sven Eckelmann -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl5iPBEWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoX+wEAC3nG5q5Yu6G81vSdQiu8ey4xjR CIGl+kSxX/ryK7GaEhvSZ+/rdFbVsJN276CjvLTbnwIHf0QJV6a6A68+Xo0qG6Nc 7Bpo9VP43GQJfmzFdqMQz5W/oVd88XKCfF7V9Cf+FIMkBTmuuxGfmNKcjpJVWZD4 imQihGpknW/QUjlKuIVHD79ZEBn75iCR0HYK8Xuc88XIDUUDA/fROdXJS8jge4BM 4UBFn46xQndBAVOJkYwd2FKhLwLT0YVJqMLBoQBf3Lq5Td6R2x6yrzXfyzNlKXEp oaZsHSB0Zo9v7ICxqY+CA0Yk8SxRrbGV2+cxjOHgudDcDnWF0MELcah0KOmcsPcx +G+dacTECrI0lxDu+LAWL//weKjHVh7WOeJk8wqRDgSV7AmCrSZS83v1cRXPQF41 0UTKbWPnCe6TANmgllt+H7ere2O9vGbrLPD9jLH8kE1eDgfG1lLC8PjIxA5L/6T5 sEvDg2/6JdQtAjqrAc+MP4hm4H2yEzK+n6ozA1Fn9gXn1RiKFr/zOB33PyaHUkUL K2MXVZANW0RLKRIb/zXQ8bJcMEAU7aeHZytm7n/JmRpSkRV1uvJSgeOF/NvZFg9P uBE0TO2RzXmJyUBzdSdZfrCSPobFbR1Gs9Uks+1U5FXjxw6j9adx+/5WuRbhJii1 qdv3ueo7awMeX3RrJg== =JJc9 -----END PGP SIGNATURE----- Merge tag 'batadv-net-for-davem-20200306' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Don't schedule OGM for disabled interface, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-09 19:08:43 -07:00
Vladimir Oltean	a8015ded89	net: mscc: ocelot: properly account for VLAN header length when setting MRU What the driver writes into MAC_MAXLEN_CFG does not actually represent VLAN_ETH_FRAME_LEN but instead ETH_FRAME_LEN + ETH_FCS_LEN. Yes they are numerically equal, but the difference is important, as the switch treats VLAN-tagged traffic specially and knows to increase the maximum accepted frame size automatically. So it is always wrong to account for VLAN in the MAC_MAXLEN_CFG register. Unconditionally increase the maximum allowed frame size for double-tagged traffic. Accounting for the additional length does not mean that the other VLAN membership checks aren't performed, so there's no harm done. Also, stop abusing the MTU name for configuring the MRU. There is no support for configuring the MRU on an interface at the moment. Fixes: `a556c76adc` ("net: mscc: Add initial Ocelot switch support") Fixes: `fa914e9c4d` ("net: mscc: ocelot: create a helper for changing the port MTU") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-09 18:58:17 -07:00
Eric Dumazet	afe207d80a	ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast() Commit `e18b353f10` ("ipvlan: add cond_resched_rcu() while processing muticast backlog") added a cond_resched_rcu() in a loop using rcu protection to iterate over slaves. This is breaking rcu rules, so lets instead use cond_resched() at a point we can reschedule Fixes: `e18b353f10` ("ipvlan: add cond_resched_rcu() while processing muticast backlog") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-09 18:32:03 -07:00
Dmitry Yakunin	018d26fcd1	cgroup, netclassid: periodically release file_lock on classid updating In our production environment we have faced with problem that updating classid in cgroup with heavy tasks cause long freeze of the file tables in this tasks. By heavy tasks we understand tasks with many threads and opened sockets (e.g. balancers). This freeze leads to an increase number of client timeouts. This patch implements following logic to fix this issue: аfter iterating 1000 file descriptors file table lock will be released thus providing a time gap for socket creation/deletion. Now update is non atomic and socket may be skipped using calls: dup2(oldfd, newfd); close(oldfd); But this case is not typical. Moreover before this patch skip is possible too by hiding socket fd in unix socket buffer. New sockets will be allocated with updated classid because cgroup state is updated before start of the file descriptors iteration. So in common cases this patch has no side effects. Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-09 18:13:39 -07:00
Mahesh Bandewar	ce9a4186f9	macvlan: add cond_resched() during multicast processing The Rx bound multicast packets are deferred to a workqueue and macvlan can also suffer from the same attack that was discovered by Syzbot for IPvlan. This solution is not as effective as in IPvlan. IPvlan defers all (Tx and Rx) multicast packet processing to a workqueue while macvlan does this way only for the Rx. This fix should address the Rx codition to certain extent. Tx is still suseptible. Tx multicast processing happens when .ndo_start_xmit is called, hence we cannot add cond_resched(). However, it's not that severe since the user which is generating / flooding will be affected the most. Fixes: `412ca1550c` ("macvlan: Move broadcasts into a work queue") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-09 18:02:19 -07:00
Mahesh Bandewar	e18b353f10	ipvlan: add cond_resched_rcu() while processing muticast backlog If there are substantial number of slaves created as simulated by Syzbot, the backlog processing could take much longer and result into the issue found in the Syzbot report. INFO: rcu_sched detected stalls on CPUs/tasks: (detected by 1, t=10502 jiffies, g=5049, c=5048, q=752) All QSes seen, last rcu_sched kthread activity 10502 (4294965563-4294955061), jiffies_till_next_fqs=1, root ->qsmask 0x0 syz-executor.1 R running task on cpu 1 10984 11210 3866 0x30020008 179034491270 Call Trace: <IRQ> [<ffffffff81497163>] _sched_show_task kernel/sched/core.c:8063 [inline] [<ffffffff81497163>] _sched_show_task.cold+0x2fd/0x392 kernel/sched/core.c:8030 [<ffffffff8146a91b>] sched_show_task+0xb/0x10 kernel/sched/core.c:8073 [<ffffffff815c931b>] print_other_cpu_stall kernel/rcu/tree.c:1577 [inline] [<ffffffff815c931b>] check_cpu_stall kernel/rcu/tree.c:1695 [inline] [<ffffffff815c931b>] __rcu_pending kernel/rcu/tree.c:3478 [inline] [<ffffffff815c931b>] rcu_pending kernel/rcu/tree.c:3540 [inline] [<ffffffff815c931b>] rcu_check_callbacks.cold+0xbb4/0xc29 kernel/rcu/tree.c:2876 [<ffffffff815e3962>] update_process_times+0x32/0x80 kernel/time/timer.c:1635 [<ffffffff816164f0>] tick_sched_handle+0xa0/0x180 kernel/time/tick-sched.c:161 [<ffffffff81616ae4>] tick_sched_timer+0x44/0x130 kernel/time/tick-sched.c:1193 [<ffffffff815e75f7>] __run_hrtimer kernel/time/hrtimer.c:1393 [inline] [<ffffffff815e75f7>] __hrtimer_run_queues+0x307/0xd90 kernel/time/hrtimer.c:1455 [<ffffffff815e90ea>] hrtimer_interrupt+0x2ea/0x730 kernel/time/hrtimer.c:1513 [<ffffffff844050f4>] local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1031 [inline] [<ffffffff844050f4>] smp_apic_timer_interrupt+0x144/0x5e0 arch/x86/kernel/apic/apic.c:1056 [<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778 RIP: 0010:do_raw_read_lock+0x22/0x80 kernel/locking/spinlock_debug.c:153 RSP: 0018:ffff8801dad07ab8 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff12 RAX: 0000000000000000 RBX: ffff8801c4135680 RCX: 0000000000000000 RDX: 1ffff10038826afe RSI: ffff88019d816bb8 RDI: ffff8801c41357f0 RBP: ffff8801dad07ac0 R08: 0000000000004b15 R09: 0000000000310273 R10: ffff88019d816bb8 R11: 0000000000000001 R12: ffff8801c41357e8 R13: 0000000000000000 R14: ffff8801cfb19850 R15: ffff8801cfb198b0 [<ffffffff8101460e>] __raw_read_lock_bh include/linux/rwlock_api_smp.h:177 [inline] [<ffffffff8101460e>] _raw_read_lock_bh+0x3e/0x50 kernel/locking/spinlock.c:240 [<ffffffff840d78ca>] ipv6_chk_mcast_addr+0x11a/0x6f0 net/ipv6/mcast.c:1006 [<ffffffff84023439>] ip6_mc_input+0x319/0x8e0 net/ipv6/ip6_input.c:482 [<ffffffff840211c8>] dst_input include/net/dst.h:449 [inline] [<ffffffff840211c8>] ip6_rcv_finish+0x408/0x610 net/ipv6/ip6_input.c:78 [<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:292 [inline] [<ffffffff840214de>] NF_HOOK include/linux/netfilter.h:286 [inline] [<ffffffff840214de>] ipv6_rcv+0x10e/0x420 net/ipv6/ip6_input.c:278 [<ffffffff83a29efa>] __netif_receive_skb_one_core+0x12a/0x1f0 net/core/dev.c:5303 [<ffffffff83a2a15c>] __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:5417 [<ffffffff83a2f536>] process_backlog+0x216/0x6c0 net/core/dev.c:6243 [<ffffffff83a30d1b>] napi_poll net/core/dev.c:6680 [inline] [<ffffffff83a30d1b>] net_rx_action+0x47b/0xfb0 net/core/dev.c:6748 [<ffffffff846002c8>] __do_softirq+0x2c8/0x99a kernel/softirq.c:317 [<ffffffff813e656a>] invoke_softirq kernel/softirq.c:399 [inline] [<ffffffff813e656a>] irq_exit+0x16a/0x1a0 kernel/softirq.c:439 [<ffffffff84405115>] exiting_irq arch/x86/include/asm/apic.h:561 [inline] [<ffffffff84405115>] smp_apic_timer_interrupt+0x165/0x5e0 arch/x86/kernel/apic/apic.c:1058 [<ffffffff84401cbe>] apic_timer_interrupt+0x8e/0xa0 arch/x86/entry/entry_64.S:778 </IRQ> RIP: 0010:__sanitizer_cov_trace_pc+0x26/0x50 kernel/kcov.c:102 RSP: 0018:ffff880196033bd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff12 RAX: ffff88019d8161c0 RBX: 00000000ffffffff RCX: ffffc90003501000 RDX: 0000000000000002 RSI: ffffffff816236d1 RDI: 0000000000000005 RBP: ffff880196033bd8 R08: ffff88019d8161c0 R09: 0000000000000000 R10: 1ffff10032c067f0 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000000 [<ffffffff816236d1>] do_futex+0x151/0x1d50 kernel/futex.c:3548 [<ffffffff816260f0>] C_SYSC_futex kernel/futex_compat.c:201 [inline] [<ffffffff816260f0>] compat_SyS_futex+0x270/0x3b0 kernel/futex_compat.c:175 [<ffffffff8101da17>] do_syscall_32_irqs_on arch/x86/entry/common.c:353 [inline] [<ffffffff8101da17>] do_fast_syscall_32+0x357/0xe1c arch/x86/entry/common.c:415 [<ffffffff84401a9b>] entry_SYSENTER_compat+0x8b/0x9d arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7f23c69 RSP: 002b:00000000f5d1f12c EFLAGS: 00000282 ORIG_RAX: 00000000000000f0 RAX: ffffffffffffffda RBX: 000000000816af88 RCX: 0000000000000080 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000816af8c RBP: 00000000f5d1f228 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 rcu_sched kthread starved for 10502 jiffies! g5049 c5048 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=1 rcu_sched R running task on cpu 1 13048 8 2 0x90000000 179099587640 Call Trace: [<ffffffff8147321f>] context_switch+0x60f/0xa60 kernel/sched/core.c:3209 [<ffffffff8100095a>] __schedule+0x5aa/0x1da0 kernel/sched/core.c:3934 [<ffffffff810021df>] schedule+0x8f/0x1b0 kernel/sched/core.c:4011 [<ffffffff8101116d>] schedule_timeout+0x50d/0xee0 kernel/time/timer.c:1803 [<ffffffff815c13f1>] rcu_gp_kthread+0xda1/0x3b50 kernel/rcu/tree.c:2327 [<ffffffff8144b318>] kthread+0x348/0x420 kernel/kthread.c:246 [<ffffffff84400266>] ret_from_fork+0x56/0x70 arch/x86/entry/entry_64.S:393 Fixes: `ba35f8588f` (“ipvlan: Defer multicast / broadcast processing to a work-queue”) Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-09 18:00:58 -07:00
Mahesh Bandewar	ad8192767c	ipvlan: don't deref eth hdr before checking it's set IPvlan in L3 mode discards outbound multicast packets but performs the check before ensuring the ether-header is set or not. This is an error that Eric found through code browsing. Fixes: `2ad7bf3638` (“ipvlan: Initial check-in of the IPVLAN driver.”) Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-09 17:59:25 -07:00
Edward Cree	4b1bd9db07	sfc: detach from cb_page in efx_copy_channel() It's a resource, not a parameter, so we can't copy it into the new channel's TX queues, otherwise aliasing will lead to resource- management bugs if the channel is subsequently torn down without being initialised. Before the Fixes:-tagged commit there was a similar bug with tsoh_page, but I'm not sure it's worth doing another fix for such old kernels. Fixes: `e9117e5099` ("sfc: Firmware-Assisted TSO version 2") Suggested-by: Derek Shute <Derek.Shute@stratus.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-09 17:44:05 -07:00
Jin Yao	f787feff69	perf block-info: Support color ops to print block percents in color It would be nice to print the block percents with colors. This patch supports the 'Sampled Cycles%' and 'Avg Cycles%' printed in colors. For example, perf record -b ... perf report --total-cycles or perf report --total-cycles --stdio percent > 5%, colored in red percent > 0.5%, colored in green percent < 0.5%, default color Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:25 -03:00
Jin Yao	cca0cc76f5	perf block-info: Allow selecting which columns to report and its order Currently we use a predefined array to set the block info output formats, it's fixed and inflexible. This patch adds two parameters "block_hpps" and "nr_hpps" in block_info__create_report and other static functions, in order to let user decide which columns to report and with specified report ordering. It should be more flexible. Buffers will be allocated to contain the new fmts, of course, we need to release them before perf exits. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:25 -03:00
Jin Yao	a8a9f6dc0d	perf diff: Use __block_info__cmp() to replace block_pair_cmp() 'perf diff' uses block_pair_cmp() to compare two blocks. But block_info__cmp() has the similar functionality and it's a bit more complete. This patch removes block_pair_cmp() and uses __block_info__cmp() instead. __block_info__cmp() is wrapped by block_info__cmp() and it doesn't receives a perf_hpp_fmt parameter. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:25 -03:00
Jin Yao	3e152aa984	perf block-info: Fix wrong block address comparison in block_info__cmp() Commit `6041441870` ("perf block: Cleanup and refactor block info functions") introduces block_info__cmp(), which compares two blocks. But the issues are: 1. It should return the strcmp cmp value only if it's not 0. 2. When symbol names are matched, we need to compare the addresses of blocks further. But it wrongly uses the symbol addresses for comparison. 3. If the syms are both NULL, we can't consider these two blocks are matched. This patch fixes above 3 issues. Fixes: `6041441870` ("perf block: Cleanup and refactor block info functions") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200202141655.32053-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:25 -03:00
Jiri Olsa	d942815a76	perf expr: Make expr__parse() return -1 on error To match the error value of the expr__find_other function, so all exported expr functions return the same values: 0 on success, -1 on error. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:25 -03:00
Jiri Olsa	0f9b1e124b	perf expr: Straighten expr__parse()/expr__find_other() interface Now that we have a flex parser we don't need to update the parsed string pointer, so the interface can just be passed the pointer to the expression instead of a pointer to pointer. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:24 -03:00
Jiri Olsa	58ca707636	perf expr: Increase EXPR_MAX_OTHER to support metrics with more than 15 variables We have metrics that define more than 15 variables, like Branch_Misprediction_Cost. Increasing the allowed variables count to 20. As Andy pointed out, we can't go too high in here, because some of the code has O(n^2) complexity (already_seen) and we might want to do some other changes (like using hash tables) before increasing the maximum even more. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:24 -03:00
Jiri Olsa	26226a9772	perf expr: Move expr lexer to flex Adding expr flex code instead of the manual parser code. So it's easily extensible in upcoming changes. The new flex code is in flex.l object and gets compiled like all the other flexers we use. It's defined as flex reentrant parser. It's used by both expr__parse and expr__find_other interfaces by separating the starting point. There's no intended change of functionality ;-) the test expr is passing. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:24 -03:00
Jiri Olsa	576a65b697	perf expr: Add expr.c object Add generic expr code into new expr.c object. The expr.c object will be mainly used in following change that will get rid of the manual flex code, Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200228093616.67125-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:24 -03:00
Kan Liang	277ce1efa7	perf header: Add check for unexpected use of reserved membrs in event attr The perf.data may be generated by a newer version of perf tool, which support new input bits in attr, e.g. new bit for branch_sample_type. The perf.data may be parsed by an older version of perf tool later. The old perf tool may parse the perf.data incorrectly. There is no warning message for this case. Current perf header never check for unknown input bits in attr. When read the event desc from header, check the stored event attr. The reserved bits, sample type, read format and branch sample type will be checked. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lkml.kernel.org/r/20200228163011.19358-4-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:24 -03:00
Kan Liang	d3f85437ad	perf evsel: Support PERF_SAMPLE_BRANCH_HW_INDEX A new branch sample type PERF_SAMPLE_BRANCH_HW_INDEX has been introduced in latest kernel. Enable HW_INDEX by default in LBR call stack mode. If kernel doesn't support the sample type, switching it off. Add HW_INDEX in attr_fprintf as well. User can check whether the branch sample type is set via debug information or header. Committer testing: First collect some samples with LBR callchains, system wide, for a few seconds: # perf record --call-graph lbr -a sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.625 MB perf.data (224 samples) ] # Now lets use 'perf evlist -v' to look at the branch_sample_type: # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CALLCHAIN\|CPU\|PERIOD\|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER\|CALL_STACK\|NO_FLAGS\|NO_CYCLES\|HW_INDEX # So the machine has the kernel feature, and it was correctly added to perf_event_attr.branch_sample_type, for the default 'cycles' event. If we do it in another machine, where the kernel lacks the HW_INDEX feature, we get: # perf record --call-graph lbr -a sleep 2s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.690 MB perf.data (499 samples) ] # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|CALLCHAIN\|CPU\|PERIOD\|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER\|CALL_STACK\|NO_FLAGS\|NO_CYCLES # No HW_INDEX in attr.branch_sample_type. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200228163011.19358-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:43:24 -03:00
Kan Liang	42bbabed09	perf tools: Add hw_idx in struct branch_stack The low level index of raw branch records for the most recent branch can be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX branch_sample_type. Extend struct branch_stack to support it. However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and entries[] will be output by kernel. The pointer of entries[] could be wrong, since the output format is different with new struct branch_stack. Add a variable no_hw_idx in struct perf_sample to indicate whether the hw_idx is output. Add get_branch_entry() to return corresponding pointer of entries[0]. To make dummy branch sample consistent as new branch sample, add hw_idx in struct dummy_branch_stack for cs-etm and intel-pt. Apply the new struct branch_stack for synthetic events as well. Extend test case sample-parsing to support new struct branch_stack. Committer notes: Renamed get_branch_entries() to perf_sample__branch_entries() to have proper namespacing and pave the way for this to be moved to libperf, eventually. Add 'static' to that inline as it is in a header. Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build on arm64. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-03-09 21:42:53 -03:00

... 3 4 5 6 7 ...

902227 Commits