linux

Author	SHA1	Message	Date
Rafael J. Wysocki	8edb0a6e48	intel_pstate: Use sample.core_avg_perf in get_avg_pstate() Notice that get_avg_pstate() can use sample.core_avg_perf instead of carrying the same division again, so make it do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:58:37 +02:00
Rafael J. Wysocki	a1c9787dc3	intel_pstate: Clarify average performance computation The core_pct_busy field of struct sample actually contains the average performace during the last sampling period (in percent) and not the utilization of the core as suggested by its name which is confusing. For this reason, change the name of that field to core_avg_perf and rename the function that computes its value accordingly. Also notice that storing this value as percentage requires a costly integer multiplication to be carried out in a hot path, so instead store it as an "extended fixed point" value with more fraction bits and update the code using it accordingly (it is better to change the name of the field along with its meaning in one go than to make those two changes separately, as that would likely lead to more confusion). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:58:37 +02:00
Chen Yu	4578ee7e1d	intel_pstate: Avoid unnecessary synchronize_sched() during initialization Currently, in intel_pstate_clear_update_util_hook(), after clearing the utilization update hook, we leverage synchronize_sched() to deal with synchronization, which is a little bit time-costly because synchronize_sched() has to wait for all the CPUs to go through a grace period. Actually, the synchronize_sched() is not necessary if the utilization update hook has not been set for the given CPU yet, so make the driver check if that's the case and avoid the synchronize_sched() call then. Link: https://bugzilla.kernel.org/show_bug.cgi?id=116371 Tested-by: Tian Ye <yex.tian@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw : Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:56:34 +02:00
Rafael J. Wysocki	c29af6f1a4	Merge branch 'pm-cpufreq-sched' into pm-cpufreq	2016-05-11 22:48:20 +02:00
Arnd Bergmann	cfe9492fdf	cpufreq: schedutil: Make default depend on CONFIG_SMP CPU_FREQ_GOV_SCHEDUTIL gained a dependency on SMP, so now we get a warning if it gets selected by CPU_FREQ_DEFAULT_GOV_SCHEDUTIL without SMP: warning: (CPU_FREQ_DEFAULT_GOV_SCHEDUTIL) selects CPU_FREQ_GOV_SCHEDUTIL which has unmet direct dependencies (CPU_FREQ && SMP) This adds another dependency to avoid the problem. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `bf7cdff194` (cpufreq: schedutil: Make it depend on CONFIG_SMP) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 22:47:44 +02:00
Akshay Adiga	0bc10b93f2	cpufreq: powernv: del_timer_sync when global and local pstate are equal When global and local pstate are equal in a powernv_target_index() call, we don't queue a timer. But we may have timer already queued for future. This could cause the timer to fire one additional time for no use. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 02:28:00 +02:00
Akshay Adiga	1fd3ff2874	cpufreq: powernv: Move smp_call_function_any() out of irq safe block Fix a WARN_ON caused by smp_call_function_any() when irq is disabled, because of changes made in the patch ('cpufreq: powernv: Ramp-down global pstate slower than local-pstate') https://patchwork.ozlabs.org/patch/612058/ WARNING: CPU: 0 PID: 4 at kernel/smp.c:291 smp_call_function_single+0x170/0x180 Call Trace: [c0000007f648f9f0] [c0000007f648fa90] 0xc0000007f648fa90 (unreliable) [c0000007f648fa30] [c0000000001430e0] smp_call_function_any+0x170/0x1c0 [c0000007f648fa90] [c0000000007b4b00] powernv_cpufreq_target_index+0xe0/0x250 [c0000007f648fb00] [c0000000007ac9dc] __cpufreq_driver_target+0x20c/0x3d0 [c0000007f648fbc0] [c0000000007b1b4c] od_dbs_timer+0xcc/0x260 [c0000007f648fc10] [c0000000007b3024] dbs_work_handler+0x54/0xa0 [c0000007f648fc50] [c0000000000c49a8] process_one_work+0x1d8/0x590 [c0000007f648fce0] [c0000000000c4e08] worker_thread+0xa8/0x660 [c0000007f648fd80] [c0000000000cca88] kthread+0x108/0x130 [c0000007f648fe30] [c0000000000095e8] ret_from_kernel_thread+0x5c/0x74 - Calling smp_call_function_any() with interrupt disabled (through spin_lock_irqsave) could cause a deadlock, as smp_call_function_any() relies on the IPI to complete. This is detected in the smp_call_function_any() call and hence the WARN_ON. - As the spinlock (gpstates->lock) is only used to synchronize access of global_pstate_info between timer irq handler and target_index calls. And the timer irq handler just try_locks() hence it would not cause a deadlock. Hence could do without making spinlocks irq safe. - As the smp_call_function_any() is a blocking call and does not access global_pstates_info, it could reduce the critcal section by moving smp_call_function_any() after giving up the lock. Reported-by: Abdul Haleem <abdhalee@linux.vnet.linux.com> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-11 02:28:00 +02:00
Rafael J. Wysocki	f96fd0c86f	intel_pstate: Clean up intel_pstate_get() intel_pstate_get() contains a local variable that's initialized but never used and it can be written in fewer lines of code, so clean it up. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-05-09 22:55:36 +02:00
Rafael J. Wysocki	d87de8f38a	Merge branch 'pm-cpufreq-sched' into pm-cpufreq	2016-05-09 16:00:10 +02:00
Rafael J. Wysocki	bf7cdff194	cpufreq: schedutil: Make it depend on CONFIG_SMP Make the schedutil cpufreq governor depend on CONFIG_SMP, because the scheduler-provided utilization numbers used by it are only available with CONFIG_SMP set. Fixes: `9bdcb44e39` (cpufreq: schedutil: New governor based on scheduler utilization data) Reported-by: Steve Muckle <steve.muckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-06 22:33:33 +02:00
Rafael J. Wysocki	da43af961b	Merge cpufreq fixes going into v4.6. * pm-cpufreq-fixes: intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform cpufreq: intel_pstate: Fix processing for turbo activation ratio	2016-05-06 22:01:14 +02:00
Rafael J. Wysocki	9485e4ca0b	cpufreq: governor: Fix handling of special cases in dbs_update() As reported in KBZ 69821: "With CONFIG_HZ_PERIODIC=y cpu stays at the lowest frequcency 800MHz even if usage goes to 100%, frequency does not scale up, the governor in use is ondemand. Neither works conservative. Performance and userspace governors work as expected. With CONFIG_NO_HZ_IDLE or CONFIG_NO_HZ_FULL cpu scales up with ondemand as expected." Analysis carried out by Chen Yu leads to the conclusion that the observed issue is due to idle_time in dbs_update() representing a negative number in which case the function will return 0 as the load (unless load is greater than 0 for another CPU sharing the policy), although that need not be the right choice. Indeed, idle_time representing a negative number means that during the last sampling interval the CPU was almost 100% busy on the rough average, so 100 should be returned as the load in that case. Modify the code accordingly and rearrange it to clarify the handling of all of the special cases in it. While at it, also avoid returning zero as the load if time_elapsed is 0 (it doesn't really make sense to return 0 then). Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821 Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Timo Valtoaho <timo.valtoaho@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-05-06 14:24:23 +02:00
Rafael J. Wysocki	5f2f88e330	Merge branches 'pm-opp-fixes', 'pm-cpufreq-fixes' and 'pm-cpuidle-fixes' * pm-opp-fixes: PM / OPP: Remove useless check * pm-cpufreq-fixes: intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform * pm-cpuidle-fixes: ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value	2016-05-06 13:16:22 +02:00
Ingo Molnar	1fb48f8e54	Linux 4.6-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXJoi6AAoJEHm+PkMAQRiGYKIIAIcocIV48DpGAHXFuSZbzw5D rp9EbE5TormtddPz1J1zcqu9tl5H8tfxS+LvHqRaDXqQkbb0BWKttmEpKTm9mrH8 kfGNW8uwrEgTMMsar54BypAMMhHz4ITsj3VQX5QLSC5j6wixMcOmQ+IqH0Bwt3wr Y5JXDtZRysI1GoMkSU7/qsQBjC7aaBa5VzVUiGvhV8DdvPVFQf73P89G1vzMKqb5 HRWbH4ieu6/mclLvW9N2QKGMHQntlB+9m2kG9nVWWbBSDxpAotwqQZFh3D52MBUy 6DH/PNgkVyDhX4vfjua0NrmXdwTfKxLWGxe4dZ8Z+JZP5c6pqWlClIPBCkjHj50= =KLSM -----END PGP SIGNATURE----- Merge tag 'v4.6-rc6' into x86/asm, to refresh the tree Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-05-05 08:35:00 +02:00
Srinivas Pandruvada	e59a8f7ff4	cpufreq: intel_pstate: Ignore _PPC processing under HWP When HWP (hardware P states) feature is active, the ACPI _PSS and _PPC is not used. So ignore processing for _PPC limits. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:43:47 +02:00
Sudeep Holla	d9975b0b07	cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table Currently when performing random CPU hot-plugs and suspend-to-ram(S2R) on systems using arm_big_little cpufreq driver, we get warnings similar to something like below: cpu cpu1: _opp_add: duplicate OPPs detected. Existing: freq: 600000000, volt: 800000, enabled: 1. New: freq: 600000000, volt: 800000, enabled: 1 This is mainly because the OPPs for the shared cpus are not set. We can just use dev_pm_opp_of_cpumask_add_table in case the OPPs are obtained from DT(arm_big_little_dt.c) or use dev_pm_opp_set_sharing_cpus if the OPPs are obtained by other means like firmware(e.g. scpi-cpufreq.c) Also now that the generic dev_pm_opp{,_of}_cpumask_remove_table can handle removal of opp table and entries for all associated CPUs, we can re-use dev_pm_opp{,_of}_cpumask_remove_table as free_opp_table in cpufreq_arm_bL_ops. This patch makes necessary changes to reuse the generic OPP functions for {init,free}_opp_table and thereby eliminating the warnings. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:40:04 +02:00
Marc Gonzalez	d9c99acb63	cpufreq: tango: Use generic platdev driver Add tango4 compatible string to the list. Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:36:57 +02:00
Sai Gurrappadi	e43e94c1ed	cpufreq: Fix GOV_LIMITS handling for the userspace governor Currently, the userspace governor only updates frequency on GOV_LIMITS if policy->cur falls outside policy->{min/max}. However, it is also necessary to update current frequency on GOV_LIMITS to match the user requested value if it can be achieved within the new policy->{max/min}. This was previously the behaviour in the governor until commit `d1922f0` ("cpufreq: Simplify userspace governor") which incorrectly assumed that policy->cur == user requested frequency via scaling_setspeed. This won't be true if the user requested frequency falls outside policy->{min/max}. Ex: a temporary thermal cap throttled the user requested frequency. Fix this by storing the user requested frequency in a seperate variable. The governor will then try to achieve this request on every GOV_LIMITS change. Fixes: `d1922f0256` (cpufreq: Simplify userspace governor) Signed-off-by: Sai Gurrappadi <sgurrappadi@nvidia.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-05 01:30:38 +02:00
Rafael J. Wysocki	6d45b719cb	intel_pstate: Fix intel_pstate_get() After commit `8fa520af50` "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency() to compute the average frequency, which is problematic for two reasons. First, intel_pstate_get() may be invoked before the driver reads the CPU feedback registers for the first time and if that happens, get_avg_frequency() will attempt to divide by zero. Second, the get_avg_frequency() call in intel_pstate_get() is racy with respect to intel_pstate_sample() and it may end up returning completely meaningless values for this reason. Moreover, after commit `7349ec0470` "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()" sample.core_pct_busy is never computed on Atom, but it is used in intel_pstate_adjust_busy_pstate() in that case too. To address those problems notice that if sample.core_pct_busy was used in the average frequency computation carried out by get_avg_frequency(), both the divide by zero problem and the race with respect to intel_pstate_sample() would be avoided. Accordingly, move the invocation of intel_pstate_calc_busy() from get_target_pstate_use_performance() to intel_pstate_update_util(), which also will take care of the uninitialized sample.core_pct_busy on Atom, and modify get_avg_frequency() to use sample.core_pct_busy as per the above. Reported-by: kernel test robot <ying.huang@linux.intel.com> Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4 Fixes: `8fa520af50` "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" Fixes: `7349ec0470` "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()" Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-05-04 14:09:16 +02:00
Rafael J. Wysocki	ba41e1bc28	cpufreq: intel_pstate: Fix HWP on boot CPU after system resume Commit `41cfd64cf4` "Update frequencies of policy->cpus only from ->set_policy()" changed the way the intel_pstate driver's ->set_policy callback updates the HWP (hardware-managed P-states) settings. A side effect of it is that if those settings are modified on the boot CPU during system suspend and wakeup, they will never be restored during subsequent system resume. To address this problem, allow cpufreq drivers that don't provide ->target or ->target_index callbacks to use ->suspend and ->resume callbacks and add a ->resume callback to intel_pstate to restore the HWP settings on the CPUs that belong to the given policy. Fixes: `41cfd64cf4` "Update frequencies of policy->cpus only from ->set_policy()" Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-05-02 13:48:15 +02:00
Aneesh Kumar K.V	d2adba3fd1	powerpc/mm: Abstraction for switch_mmu_context() How we switch MMU context differs between hash and radix. For hash we need to switch the SLB details and for radix we need to switch the PID SPR. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-05-01 18:33:04 +10:00
Rafael J. Wysocki	81be193b7e	Merge branch 'pm-cpufreq-fixes' * pm-cpufreq-fixes: cpufreq: intel_pstate: Fix processing for turbo activation ratio Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"	2016-04-29 14:22:25 +02:00
Sudeep Holla	2482bc31ca	cpufreq: st: enable selective initialization based on the platform The sti-cpufreq does unconditional registration of the cpufreq-dt driver which causes issue on an multi-platform build. For example, on Vexpress TC2 platform, we get the following error on boot: cpu cpu0: OPP-v2 not supported cpu cpu0: Not doing voltage scaling cpu: dev_pm_opp_of_cpumask_add_table: couldn't find opp table for cpu:0, -19 cpu cpu0: dev_pm_opp_get_max_volt_latency: Invalid regulator (-6) ... arm_big_little: bL_cpufreq_register: Failed registering platform driver: vexpress-spc, err: -17 The actual driver fails to initialise as cpufreq-dt is probed successfully, which is incorrect. This issue can happen to any platform not using cpufreq-dt in a multi-platform build. This patch adds a check to do selective initialization of the driver. Fixes: `ab0ea257fc` (cpufreq: st: Provide runtime initialised driver for ST's platforms) Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Lee Jones <lee.jones@linaro.org> Cc: 4.5+ <stable@vger.kernel.org> # 4.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 15:25:56 +02:00
Viresh Kumar	9f123def55	cpufreq: mvebu: Move cpufreq code into drivers/cpufreq/ Move cpufreq bits for mvebu into drivers/cpufreq/ directory, that's where they really belong to. Compiled tested only. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 15:22:43 +02:00
Viresh Kumar	eb96924acd	cpufreq: dt: Kill platform-data There are no more users of platform-data for cpufreq-dt driver, get rid of it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 15:22:43 +02:00
Viresh Kumar	1530b9963e	cpufreq: dt: Identify cpu-sharing for platforms without operating-points-v2 Existing platforms, which do not support operating-points-v2, can explicitly tell the opp core that some of the CPUs share opp tables, with help of dev_pm_opp_set_sharing_cpus(). For such platforms, explicitly ask the opp core to provide list of CPUs sharing the opp table with current cpu device, before falling back to platform data. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 15:22:42 +02:00
Rafael J. Wysocki	b4f4b4b371	cpufreq: governor: Change confusing struct field and variable names The name of the prev_cpu_wall field in struct cpu_dbs_info is confusing, because it doesn't represent wall time, but the previous update time as returned by get_cpu_idle_time() (that may be the current value of jiffies_64 in some cases, for example). Moreover, the names of some related variables in dbs_update() take that confusion further. Rename all of those things to make their names reflect the purpose more accurately. While at it, drop unnecessary parens from one of the updated expressions. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Chen Yu <yu.c.chen@intel.com>	2016-04-28 15:10:08 +02:00
Srinivas Pandruvada	2b3ec76505	cpufreq: intel_pstate: Enable PPC enforcement for servers For platforms which are controlled via remove node manager, enable _PPC by default. These platforms are mostly categorized as enterprise server or performance servers. These platforms needs to go through some certifications tests, which tests control via _PPC. The relative risk of enabling by default is low as this is is less likely that these systems have broken _PSS table. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 01:01:39 +02:00
Srinivas Pandruvada	3be9200d51	cpufreq: intel_pstate: Adjust policy->max When policy->max is changed via _PPC or sysfs and is more than the max non turbo frequency, it does not really change resulting performance in some processors. When policy->max results in a P-State ratio more than the turbo activation ratio, then processor can choose any P-State up to max turbo. So the user or _PPC setting has no value, but this can cause undesirable side effects like: - Showing reduced max percentage in Intel P-State sysfs - It can cause reduced max performance under certain boundary conditions: The requested max scaling frequency either via _PPC or via cpufreq-sysfs, will be converted into a fixed floating point max percent scale. In majority of the cases this will result in correct max. But not 100% of the time. If the _PPC is requested at a point where the calculation lead to a lower max, this can result in a lower P-State then expected and it will impact performance. Example of this condition using a Broadwell laptop with config TDP. ACPI _PSS table from a Broadwell laptop `2301000` 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000 1300000 1100000 1000000 900000 800000 600000 500000 The actual results by disabling config TDP so that we can get what is requested on or below 2300000Khz. scaling_max_freq Max Requested P-State Resultant scaling max ---------------------------------------- ---------------------- 2400000 18 `2900000` (max turbo) 2300000 17 2300000 (max physical non turbo) 2200000 15 2100000 2100000 15 2100000 2000000 13 1900000 1900000 13 1900000 1800000 12 1800000 1700000 11 1700000 1600000 10 1600000 1500000 f 1500000 1400000 e 1400000 1300000 d 1300000 1200000 c 1200000 1100000 a 1000000 1000000 a 1000000 900000 9 900000 800000 8 800000 700000 7 700000 600000 6 600000 500000 5 500000 ------------------------------------------------------------------ Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz) in BIOS (not every system will let you adjust this). The turbo activation ratio will be set to one less than that, which will be 0x0a (So any request above 1000000KHz should result in turbo region assuming no thermal limits). Here _PPC will request max to 1100000KHz (which basically should still result in turbo as this is more than the turbo activation ratio up to max allowable turbo frequency), but actual calculation resulted in a max ceiling P-State which is 0x0a. So under any load condition, this driver will not request turbo P-States. This will be a huge performance hit. When config TDP feature is ON, if the _PPC points to a frequency above turbo activation ratio, the performance can still reach max turbo. In this case we don't need to treat this as the reduced frequency in set_policy callback. In this change when config TDP is active (by checking if the physical max non turbo ratio is more than the current max non turbo ratio), any request above current max non turbo is treated as full performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw : Minor cleanups ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 01:01:39 +02:00
Srinivas Pandruvada	9522a2ff9c	cpufreq: intel_pstate: Enforce _PPC limits Use ACPI _PPC notification to limit max P state driver will request. ACPI _PPC change notification is sent by BIOS to limit max P state in several cases: - Reduce impact of platform thermal condition - When Config TDP feature is used, a changed _PPC is sent to follow TDP change - Remote node managers in server want to control platform power via baseboard management controller (BMC) This change registers with ACPI processor performance lib so that _PPC changes are notified to cpufreq core, which in turns will result in call to .setpolicy() callback. Also the way _PSS table identifies a turbo frequency is not compatible to max turbo frequency in intel_pstate, so the very first entry in _PSS needs to be adjusted. This feature can be turned on by using kernel parameters: intel_pstate=support_acpi_ppc Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Minor cleanups ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-28 01:01:39 +02:00
Akshay Adiga	eaa2c3aeef	cpufreq: powernv: Ramp-down global pstate slower than local-pstate The frequency transition latency from pmin to pmax is observed to be in few millisecond granurality. And it usually happens to take a performance penalty during sudden frequency rampup requests. This patch set solves this problem by using an entity called "global pstates". The global pstate is a Chip-level entity, so the global entitiy (Voltage) is managed across the cores. The local pstate is a Core-level entity, so the local entity (frequency) is managed across threads. This patch brings down global pstate at a slower rate than the local pstate. Hence by holding global pstates higher than local pstate makes the subsequent rampups faster. A per policy structure is maintained to keep track of the global and local pstate changes. The global pstate is brought down using a parabolic equation. The ramp down time to pmin is set to ~5 seconds. To make sure that the global pstates are dropped at regular interval , a timer is queued for every 2 seconds during ramp-down phase, which eventually brings the pstate down to local pstate. Iozone results show fairly consistent performance boost. YCSB on redis shows improved Max latencies in most cases. Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb with different record sizes . The following table shows IOoperations/sec with and without patch. Iozone Results ( in op/sec) ( mean over 3 iterations ) --------------------------------------------------------------------- file size- with without % recordsize-IOtype patch patch change ---------------------------------------------------------------------- 200704-1-SeqWrite 1616532 1615425 0.06 200704-1-Rewrite 2423195 2303130 5.21 200704-2-SeqWrite 1628577 1602620 1.61 200704-2-Rewrite 2428264 2312154 5.02 200704-4-SeqWrite 1617605 1617182 0.02 200704-4-Rewrite 2430524 2351238 3.37 200704-8-SeqWrite 1629478 1600436 1.81 200704-8-Rewrite `2415308` 2298136 5.09 200704-16-SeqWrite 1619632 1618250 0.08 200704-16-Rewrite 2396650 2352591 1.87 200704-32-SeqWrite 1632544 1598083 2.15 200704-32-Rewrite 2425119 2329743 4.09 200704-64-SeqWrite 1617812 1617235 0.03 200704-64-Rewrite 2402021 2321080 3.48 200704-128-SeqWrite 1631998 1600256 1.98 200704-128-Rewrite 2422389 2304954 5.09 200704-256 SeqWrite 1617065 1616962 0.00 200704-256-Rewrite 2432539 2301980 5.67 200704-512-SeqWrite 1632599 1598656 2.12 200704-512-Rewrite 2429270 2323676 4.54 200704-1024-SeqWrite 1618758 1616156 0.16 200704-1024-Rewrite 2431631 2315889 4.99 401408-1-SeqWrite 1631479 1608132 1.45 401408-1-Rewrite 2501550 2459409 1.71 401408-2-SeqWrite 1617095 1626069 -0.55 401408-2-Rewrite 2507557 2443621 2.61 401408-4-SeqWrite 1629601 1611869 1.10 401408-4-Rewrite 2505909 2462098 1.77 401408-8-SeqWrite 1617110 1626968 -0.60 401408-8-Rewrite 2512244 2456827 2.25 401408-16-SeqWrite 1632609 1609603 1.42 401408-16-Rewrite 2500792 2451405 2.01 401408-32-SeqWrite 1619294 1628167 -0.54 401408-32-Rewrite 2510115 2451292 2.39 401408-64-SeqWrite 1632709 1603746 1.80 401408-64-Rewrite 2506692 2433186 3.02 401408-128-SeqWrite 1619284 1627461 -0.50 401408-128-Rewrite 2518698 2453361 2.66 401408-256-SeqWrite 1634022 1610681 1.44 401408-256-Rewrite 2509987 2446328 2.60 401408-512-SeqWrite 1617524 1628016 -0.64 401408-512-Rewrite 2504409 2442899 2.51 401408-1024-SeqWrite 1629812 1611566 1.13 401408-1024-Rewrite 2507620 2442968 2.64 Tested with YCSB workload (50% update + 50% read) over redis for 1 million records and 1 million operation. Each test was carried out with target operations per second and persistence disabled. Max-latency (in us)( mean over 5 iterations ) --------------------------------------------------------------- op/s Operation with patch without patch %change --------------------------------------------------------------- 15000 Read 61480.6 50261.4 22.32 15000 cleanup 215.2 293.6 -26.70 15000 update 25666.2 25163.8 2.00 25000 Read 32626.2 89525.4 -63.56 25000 cleanup 292.2 263.0 11.10 25000 update 32293.4 90255.0 -64.22 35000 Read 34783.0 33119.0 5.02 35000 cleanup 321.2 395.8 -18.8 35000 update 36047.0 38747.8 -6.97 40000 Read 38562.2 42357.4 -8.96 40000 cleanup 371.8 384.6 -3.33 40000 update 27861.4 41547.8 -32.94 45000 Read 42271.0 88120.6 -52.03 45000 cleanup 263.6 383.0 -31.17 45000 update 29755.8 81359.0 -63.43 (test without target op/s) 47659 Read 83061.4 136440.6 -39.12 47659 cleanup 195.8 193.8 1.03 47659 update 73429.4 124971.8 -41.24 Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-27 23:56:58 +02:00
Shilpasri G Bhat	2920e9ce8f	cpufreq: powernv: Remove flag use-case of policy->driver_data commit `1b0289848d` ("cpufreq: powernv: Add sysfs attributes to show throttle stats") used policy->driver_data as a flag for one-time creation of throttle sysfs files. Instead of this use 'kernfs_find_and_get()' to check if the attribute already exists. This is required as policy->driver_data is used for other purposes in the later patch. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-27 23:56:58 +02:00
Javier Martinez Canillas	6de0dc4b53	cpufreq: e_powersaver: Use IS_ENABLED() instead of checking for built-in or module The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either built-in or as a module, use that macro instead of open coding the same. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-27 22:42:34 +02:00
Srinivas Pandruvada	1becf03545	cpufreq: intel_pstate: Fix processing for turbo activation ratio When the config TDP level is not nominal (level = 0), the MSR values for reading level 1 and level 2 ratios contain power in low 14 bits and actual ratio bits are at bits [23:16]. The current processing for level 1 and level 2 is wrong as there is no shift done to get actual ratio. Fixes: `6a35fc2d6c` (cpufreq: intel_pstate: get P1 from TAR when available) Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 23:39:09 +02:00
Rafael J. Wysocki	ba1ca654f3	cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start() The way cpufreq_governor_start() initializes j_cdbs->prev_load is questionable. First off, j_cdbs->prev_cpu_wall used as a denominator in the computation may be zero. The case this happens is when get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy() used to return that number is called exactly at the jiffies_64 wrap time. It is rather hard to trigger that error, but it is not impossible and it will just crash the kernel then. Second, j_cdbs->prev_load is computed as the average load during the entire time since the system started and it may not reflect the load in the previous sampling period (as it is expected to). That doesn't play well with the way dbs_update() uses that value. Namely, if the update time delta (wall_time) happens do be greater than twice the sampling rate on the first invocation of it, the initial value of j_cdbs->prev_load (which may be completely off) will be returned to the caller as the current load (unless it is equal to zero and unless another CPU sharing the same policy object has a greater load value). For this reason, notice that the prev_load field of struct cpu_dbs_info is only used by dbs_update() and only in that one place, so if cpufreq_governor_start() is modified to always initialize it to 0, it will make dbs_update() always compute the actual load first time it checks the update time delta against the doubled sampling rate (after initialization) and there won't be any side effects of it. Consequently, modify cpufreq_governor_start() as described. Fixes: `18b46abd00` (cpufreq: governor: Be friendly towards latency-sensitive bursty workloads) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-25 16:21:34 +02:00
Viresh Kumar	3920be471c	cpufreq: hisilicon: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:24 +02:00
Viresh Kumar	5e4249c6d9	cpufreq: zynq: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:24 +02:00
Viresh Kumar	117d4f59af	cpufreq: sunxi: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:24 +02:00
Viresh Kumar	a399dc9fc5	cpufreq: shmobile: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Finley Xiao	014400c127	cpufreq: rockchip: Use generic platdev driver This patch add rockchip's compatible string to the compat list and remove similar code from platform code for supporting generic platdev driver. Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Viresh Kumar	7694ca6e1d	cpufreq: omap: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Viresh Kumar	7ead83f6df	cpufreq: imx: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Note that the complete routine imx27_dt_init() is removed as of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); has same effect as a NULL .init_machine machine callback pointer. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Viresh Kumar	a59511d1da	cpufreq: berlin: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:23 +02:00
Viresh Kumar	e92bb16674	cpufreq: dt: Mark platdev machines array as __initconst The machines array in cpufreq-dt-platdev is used only once at boot time and so should be marked with __initconst, so that kernel can free up memory used for it, if required. Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:18:22 +02:00
Jia Hongtao	394cb8316b	cpufreq: qoriq: Fix cooling device registration issue during suspend Cooling device is registered by ready callback. It's also invoked while system resuming from sleep (Enabling non-boot cpus). Thus cooling device may be multiple registered. Matchable unregistration is added to exit callback to fix this issue. Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:07:02 +02:00
Jia Hongtao	495c716f17	cpufreq: qoriq: Remove __exit macro from .exit callback .exit callback (qoriq_cpufreq_cpu_exit()) is also used during suspend. So __exit macro should be removed or the function will be discarded. Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:07:02 +02:00
Jia Hongtao	27b8fe8daf	cpufreq: qoriq: Don't show cooling device messages if THERMAL_OF undefined When THERMAL_OF is undefined the cooling device messages should not be shown. -ENOSYS is returned from of_cpufreq_cooling_register() when THERMAL_OF is undefined. Signed-off-by: Jia Hongtao <hongtao.jia@nxp.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 16:00:42 +02:00
Ashwin Chaugule	a29a1e7678	cpufreq: ACPI / CPPC: Add module support for cppc_cpufreq driver Add a function to cleanup at module exit and export appropriate GPL string to enable moduler support for the cppc_cpufreq driver. Reported-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 15:59:35 +02:00
Philippe Longepe	bdcaa23fb8	cpufreq: intel_pstate: Use average P-State instead of current P-State The result returned by pid_calc() is subtracted from current_pstate (which is the P-State requested during the last period) in order to obtain the target P-State for the current iteration. However, current_pstate may not reflect the real current P-State of the CPU. In particular, that P-State may be higher because of the frequency sharing per module. The theory is: - The load is the percentage of time spent in C0 and is related to the average P-State during the same period. - The last requested P-State can be completely different than the average P-State (because of frequency sharing or throttling). - The P-State shift computed by the pid_calc is based on the load computed at average P-State, so the shift must be relative to this average P-State. Using the average P-State instead of current P-State improves power without significant performance penalty in cases when a task migrates from one core to other core sharing frequency and voltage. Performance and power comparison with this patch on Cherry Trail platform using Android: Benchmark ?Perf ?Power FishTank 10.45% 3.1% SmartBench-Gaming -0.1% -10.4% SmartBench-Productivity -0.8% -10.4% CandyCrush n/a -17.4% AngryBirds n/a -5.9% videoPlayback n/a -13.9% audioPlayback n/a -4.9% IcyRocks-20-50 0.0% -38.4% iozone RR -0.16% -1.3% iozone RW 0.74% -1.3% Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-25 15:45:11 +02:00
Rafael J. Wysocki	1cbc99dfe5	Merge back cpufreq changes for v4.7.	2016-04-25 15:44:01 +02:00
Rafael J. Wysocki	94862a62df	Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC" Revert commit `0df35026c6` (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC) that introduced a regression by causing the ondemand cpufreq governor to misbehave for CONFIG_TICK_CPU_ACCOUNTING unset (the frequency goes up to the max at one point and stays there indefinitely). The revert takes subsequent modifications of the code in question into account. Fixes: `0df35026c6` (cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC) Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261 Reported-and-tested-by: Timo Valtoaho <timo.valtoaho@gmail.com> Cc: 4.5+ <stable@vger.kernel.org> # 4.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-25 02:39:20 +02:00
Rafael J. Wysocki	395da1259a	Merge branch 'pm-cpufreq-fixes' * pm-cpufreq-fixes: cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set intel_pstate: Avoid getting stuck in high P-states when idle	2016-04-21 20:57:46 +02:00
Ingo Molnar	6666ea558b	Linux 4.6-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXFELfAAoJEHm+PkMAQRiGRYIH+wWsUva7TR9arN1ZrURvI17b KqyQH8Ov9zJBsIaq/rFXOr5KfNgx7BU9BL9h7QkBy693HXTWf+GTZ1czHM4N12C3 0ZdHGrLwTHo2zdisiQaFORZSfhSVTUNGXGHXw13bUMgEqatPgkozXEnsvXXNdt1Z HtlcuJn3pcj+QIY7qDXZgTLTwgn248hi1AgNag+ntFcWiz21IYaMIi7/mCY9QUIi AY+Y3hqFQM7/8cVyThGS5wZPTg1YzdhsLJpoCk0TbS8FvMEnA+ylcTgc15C78bwu AxOwM3OCmH4gMsd7Dd/O+i9lE3K6PFrgzdDisYL3P7eHap+EdiLDvVzPDPPx0xg= =Q7r3 -----END PGP SIGNATURE----- Merge tag 'v4.6-rc4' into x86/asm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-04-19 10:38:52 +02:00
Rafael J. Wysocki	c9d9c929e6	cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set Since governor operations are generally skipped if cpufreq_suspended is set, cpufreq_start_governor() should do nothing in that case. That function is called in the cpufreq_online() path, and may also be called from cpufreq_offline() in some cases, which are invoked by the nonboot CPUs disabing/enabling code during system suspend to RAM and resume. That happens when all devices have been suspended, so if the cpufreq driver relies on things like I2C to get the current frequency, it may not be ready to do that then. To prevent problems from happening for this reason, make cpufreq_update_current_freq(), which is the only function invoked by cpufreq_start_governor() that doesn't check cpufreq_suspended already, return 0 upfront if cpufreq_suspended is set. Fixes: `3bbf8fe3ae` (cpufreq: Always update current frequency before startig governor) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-18 23:47:42 +02:00
Borislav Petkov	93984fbd4e	x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: iommu@lists.linux-foundation.org Cc: linux-pm@vger.kernel.org Cc: oprofile-list@lists.sf.net Link: http://lkml.kernel.org/r/1459801503-15600-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-04-13 11:37:41 +02:00
Paul Gortmaker	6a0bcab9c6	drivers/cpufreq: make ppc_cbe_cpufreq_pmi driver explicitly non-modular The Kconfig for this driver is currently: config CPU_FREQ_CBE_PMI bool "CBE frequency scaling using PMI interface" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular and unused code here, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We also delete the MODULE_LICENSE tag etc. since all that information is already contained at the top of the file in the comments. Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Christian Krafft <krafft@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-pm@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-04-11 20:30:43 +10:00
Rafael J. Wysocki	ffb810563c	intel_pstate: Avoid getting stuck in high P-states when idle Jörg Otte reports that commit `a4675fbc4a` (cpufreq: intel_pstate: Replace timers with utilization update callbacks) caused the CPUs in his Haswell-based system to stay in the very high frequency region even if the system is completely idle. That turns out to be an existing problem in the intel_pstate driver's P-state selection algorithm for Core processors. Namely, all decisions made by that algorithm are based on the average frequency of the CPU between sampling events and on the P-state requested on the last invocation, so it may get stuck at a very hight frequency even if the utilization of the CPU is very low (in fact, it may get stuck in a inadequate P-state regardless of the CPU utilization). The only way to kick it out of that limbo is a sufficiently long idle period (3 times longer than the prescribed sampling interval), but if that doesn't happen often enough (eg. due to a timing change like after the above commit), the P-state of the CPU may be inadequate pretty much all the time. To address the most egregious manifestations of that issue, reset the core_busy value used to determine the next P-state to request if the utilization of the CPU, determined with the help of the MPERF feedback register and the TSC, is below 1%. Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771 Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-10 05:59:10 +02:00
Viresh Kumar	8cee1eed8e	cpufreq: ACPI: Remove freq_table from acpi_cpufreq_data The freq-table is stored in struct cpufreq_policy also and there is absolutely no need of keeping a copy of its reference in struct acpi_cpufreq_data. Drop it. Also policy->freq_table can't be NULL in the target() callback, remove the useless check as well. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:54:31 +02:00
Viresh Kumar	9b55f55af8	cpufreq: ACPI: policy->driver_data can't be NULL in ->exit() Its always set by ->init() and so it will always be there in ->exit(). There is no need to have a special check for just that. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:41:50 +02:00
Rafael J. Wysocki	a794d6138c	cpufreq: Rearrange cpufreq_add_dev() Reorganize the code in cpufreq_add_dev() to avoid using the ret variable and reduce the indentation level in it. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-09 01:37:07 +02:00
Rafael J. Wysocki	cd73e9b01f	cpufreq: Simplify switch () in cpufreq_cpu_callback() Merge two switch entries that do the same thing in cpufreq_cpu_callback(). No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-09 01:37:07 +02:00
Joe Perches	1c5864e26c	cpufreq: Use consistent prefixing via pr_fmt Use the more common kernel style adding a define for pr_fmt. Miscellanea: o Remove now unused PFX defines Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:35:18 +02:00
Joe Perches	b49c22a6ca	cpufreq: Convert printk(KERN_<LEVEL> to pr_<level> Use the more common logging style. Miscellanea: o Coalesce formats o Realign arguments o Add a missing space between a coalesced format Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:35:18 +02:00
Joe Perches	4836df173a	intel_pstate: Use pr_fmt Prefix the output using the more common kernel style. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:33:42 +02:00
Geliang Tang	d2499d05f0	cpufreq: mt8173: use list_for_each_entry() Use list_for_each_entry() instead of list_for_each*() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:27:54 +02:00
Rafael J. Wysocki	22590efb98	intel_pstate: Avoid pointless FRAC_BITS shifts under div_fp() There are multiple places in intel_pstate where int_tofp() is applied to both arguments of div_fp(), but this is pointless, because int_tofp() simply shifts its argument to the left by FRAC_BITS which mathematically is equivalent to multuplication by 2^FRAC_BITS, so if this is done to both arguments of a division, the extra factors will cancel each other during that operation anyway. Drop the pointless int_tofp() applied to div_fp() arguments throughout the driver. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:25:58 +02:00
Viresh Kumar	2249c00a0b	cpufreq: exynos: Use generic platdev driver The cpufreq-dt-platdev driver supports creation of cpufreq-dt platform device now, reuse that and remove similar code from platform code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:18:42 +02:00
Viresh Kumar	f56aad1d98	cpufreq: dt: Add generic platform-device creation support Multiple platforms are using the generic cpufreq-dt driver now, and all of them are required to create a platform device with name "cpufreq-dt", in order to get the cpufreq-dt probed. Many of them do it from platform code, others have special drivers just to do that. It would be more sensible to do this at a generic place, where all such platform can mark their entries. This patch adds a separate file to get this device created. Currently the compat list of platforms that we support is empty, and will be filled in as and when we move platforms to use it. It always compiles as part of the kernel and so doesn't need a module-exit operation. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:18:42 +02:00
Viresh Kumar	f3f24dea2c	cpufreq: tegra124: No need of setting platform-data All CPUs on Tegra platform share clock/voltage lines and there is absolutely no need of setting platform data for 'cpufreq-dt' platform device, as that's the default case. Stop setting platform data for cpufreq-dt device. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Thierry Reding <treding@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:12:09 +02:00
Paul Gortmaker	dbbe972c11	cpufreq: ppc_cbe_cpufreq_pmi: make the driver explicitly non-modular The Kconfig for this driver is currently: config CPU_FREQ_CBE_PMI bool "CBE frequency scaling using PMI interface" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular and unused code here, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We also delete the MODULE_LICENSE tag etc. since all that information is already contained at the top of the file in the comments. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-09 01:11:04 +02:00
Rafael J. Wysocki	4b42fafc1c	Merge branch 'pm-cpufreq-sched' into pm-cpufreq	2016-04-09 01:08:02 +02:00
Rafael J. Wysocki	6c9d9c8192	cpufreq: Call cpufreq_disable_fast_switch() in sugov_exit() Due to differences in the cpufreq core's handling of runtime CPU offline and nonboot CPUs disabling during system suspend-to-RAM, fast frequency switching gets disabled after a suspend-to-RAM and resume cycle on all of the nonboot CPUs. To prevent that from happening, move the invocation of cpufreq_disable_fast_switch() from cpufreq_exit_governor() to sugov_exit(), as the schedutil governor is the only user of fast frequency switching today anyway. That simply prevents cpufreq_disable_fast_switch() from being called without invoking the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT event (which happens during system suspend now). Fixes: `b7898fda5b` (cpufreq: Support for fast frequency switching) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-08 22:41:36 +02:00
Rafael J. Wysocki	fa81e66ec8	Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'acpi-cppc' * pm-cpufreq: cpufreq: dt: Drop stale comment cpufreq: intel_pstate: Documenation for structures cpufreq: intel_pstate: fix inconsistency in setting policy limits intel_pstate: Avoid extra invocation of intel_pstate_sample() intel_pstate: Do not set utilization update hook too early * pm-cpuidle: intel_idle: Add KBL support intel_idle: Add SKX support intel_idle: Clean up all registered devices on exit. intel_idle: Propagate hot plug errors. intel_idle: Don't overreact to a cpuidle registration failure. intel_idle: Setup the timer broadcast only on successful driver load. intel_idle: Avoid a double free of the per-CPU data. intel_idle: Fix dangling registration on error path. intel_idle: Fix deallocation order on the driver exit path. intel_idle: Remove redundant initialization calls. intel_idle: Fix a helper function's return value. intel_idle: remove useless return from void function. * acpi-cppc: mailbox: pcc: Don't access an unmapped memory address space	2016-04-08 21:46:05 +02:00
Viresh Kumar	b318556479	cpufreq: dt: Drop stale comment The comment in file header doesn't hold true anymore, drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-05 03:40:44 +02:00
Srinivas Pandruvada	13ad7701f9	cpufreq: intel_pstate: Documenation for structures No code change. Only added kernel doc style comments for structures. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-05 03:39:05 +02:00
Srinivas Pandruvada	30a3915385	cpufreq: intel_pstate: fix inconsistency in setting policy limits When user sets performance policy using cpufreq interface, it is possible that because of policy->max limits, the actual performance is still limited. But the current implementation will silently switch the policy to powersave and start using powersave limits. If user modifies any limits using intel_pstate sysfs, this is actually changing powersave limits. The current implementation tracks limits under powersave and performance policy using two different variables. When policy->max is less than policy->cpuinfo.max_freq, only powersave limit variable is used. This fix causes the performance limits variable to be used always when the policy is performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-05 03:37:13 +02:00
Rafael J. Wysocki	9bdcb44e39	cpufreq: schedutil: New governor based on scheduler utilization data Add a new cpufreq scaling governor, called "schedutil", that uses scheduler-provided CPU utilization information as input for making its decisions. Doing that is possible after commit `34e2c555f3` (cpufreq: Add mechanism for registering utilization update callbacks) that introduced cpufreq_update_util() called by the scheduler on utilization changes (from CFS) and RT/DL task status updates. In particular, CPU frequency scaling decisions may be based on the the utilization data passed to cpufreq_update_util() by CFS. The new governor is relatively simple. The frequency selection formula used by it depends on whether or not the utilization is frequency-invariant. In the frequency-invariant case the new CPU frequency is given by next_freq = 1.25 * max_freq * util / max where util and max are the last two arguments of cpufreq_update_util(). In turn, if util is not frequency-invariant, the maximum frequency in the above formula is replaced with the current frequency of the CPU: next_freq = 1.25 * curr_freq * util / max The coefficient 1.25 corresponds to the frequency tipping point at (util / max) = 0.8. All of the computations are carried out in the utilization update handlers provided by the new governor. One of those handlers is used for cpufreq policies shared between multiple CPUs and the other one is for policies with one CPU only (and therefore it doesn't need to use any extra synchronization means). The governor supports fast frequency switching if that is supported by the cpufreq driver in use and possible for the given policy. In the fast switching case, all operations of the governor take place in its utilization update handlers. If fast switching cannot be used, the frequency switch operations are carried out with the help of a work item which only calls __cpufreq_driver_target() (under a mutex) to trigger a frequency update (to a value already computed beforehand in one of the utilization update handlers). Currently, the governor treats all of the RT and DL tasks as "unknown utilization" and sets the frequency to the allowed maximum when updated from the RT or DL sched classes. That heavy-handed approach should be replaced with something more subtle and specifically targeted at RT and DL tasks. The governor shares some tunables management code with the "ondemand" and "conservative" governors and uses some common definitions from cpufreq_governor.h, but apart from that it is stand-alone. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2016-04-02 01:09:12 +02:00
Rafael J. Wysocki	b7898fda5b	cpufreq: Support for fast frequency switching Modify the ACPI cpufreq driver to provide a method for switching CPU frequencies from interrupt context and update the cpufreq core to support that method if available. Introduce a new cpufreq driver callback, ->fast_switch, to be invoked for frequency switching from interrupt context by (future) governors supporting that feature via (new) helper function cpufreq_driver_fast_switch(). Add two new policy flags, fast_switch_possible, to be set by the cpufreq driver if fast frequency switching can be used for the given policy and fast_switch_enabled, to be set by the governor if it is going to use fast frequency switching for the given policy. Also add a helper for setting the latter. Since fast frequency switching is inherently incompatible with cpufreq transition notifiers, make it possible to set the fast_switch_enabled only if there are no transition notifiers already registered and make the registration of new transition notifiers fail if fast_switch_enabled is set for at least one policy. Implement the ->fast_switch callback in the ACPI cpufreq driver and make it set fast_switch_possible during policy initialization as appropriate. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:03 +02:00
Rafael J. Wysocki	379480d825	cpufreq: Move governor symbols to cpufreq.h Move definitions of symbols related to transition latency and sampling rate to include/linux/cpufreq.h so they can be used by (future) goverernors located outside of drivers/cpufreq/. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:02 +02:00
Rafael J. Wysocki	66893b6ac9	cpufreq: Move governor attribute set headers to cpufreq.h Move definitions and function headers related to struct gov_attr_set to include/linux/cpufreq.h so they can be used by (future) goverernors located outside of drivers/cpufreq/. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:02 +02:00
Rafael J. Wysocki	2d0c58ad60	cpufreq: governor: Move abstract gov_attr_set code to seperate file Move abstract code related to struct gov_attr_set to a separate (new) file so it can be shared with (future) goverernors that won't share more code with "ondemand" and "conservative". No intentional functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:01 +02:00
Rafael J. Wysocki	0dd3c1d678	cpufreq: governor: New data type for management part of dbs_data In addition to fields representing governor tunables, struct dbs_data contains some fields needed for the management of objects of that type. As it turns out, that part of struct dbs_data may be shared with (future) governors that won't use the common code used by "ondemand" and "conservative", so move it to a separate struct type and modify the code using struct dbs_data to follow. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:00 +02:00
Rafael J. Wysocki	0bed612be6	cpufreq: sched: Helpers to add and remove update_util hooks Replace the single helper for adding and removing cpufreq utilization update hooks, cpufreq_set_update_util_data(), with a pair of helpers, cpufreq_add_update_util_hook() and cpufreq_remove_update_util_hook(), and modify the users of cpufreq_set_update_util_data() accordingly. With the new helpers, the code using them doesn't need to worry about the internals of struct update_util_data and in particular it doesn't need to worry about populating the func field in it properly upfront. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2016-04-02 01:08:43 +02:00
Rafael J. Wysocki	9fa64d6424	Merge back intel_pstate fixes for v4.6. * pm-cpufreq: intel_pstate: Avoid extra invocation of intel_pstate_sample() intel_pstate: Do not set utilization update hook too early	2016-04-02 01:07:17 +02:00
Rafael J. Wysocki	febce40feb	intel_pstate: Avoid extra invocation of intel_pstate_sample() The initialization of intel_pstate for a given CPU involves populating the fields of its struct cpudata that represent the previous sample, but currently that is done in a problematic way. Namely, intel_pstate_init_cpu() makes an extra call to intel_pstate_sample() so it reads the current register values that will be used to populate the "previous sample" record during the next invocation of intel_pstate_sample(). However, after commit `a4675fbc4a` (cpufreq: intel_pstate: Replace timers with utilization update callbacks) that doesn't work for last_sample_time, because the time value is passed to intel_pstate_sample() as an argument now. Passing 0 to it from intel_pstate_init_cpu() is problematic, because that causes cpu->last_sample_time == 0 to be visible in get_target_pstate_use_performance() (and hence the extra cpu->last_sample_time > 0 check in there) and effectively allows the first invocation of intel_pstate_sample() from intel_pstate_update_util() to happen immediately after the initialization which may lead to a significant "turn on" effect in the governor algorithm. To mitigate that issue, rework the initialization to avoid the extra intel_pstate_sample() call from intel_pstate_init_cpu(). Instead, make intel_pstate_sample() return false if it has been called with cpu->sample.time equal to zero, which will make intel_pstate_update_util() skip the sample in that case, and reset cpu->sample.time from intel_pstate_set_update_util_hook() to make the algorithm start properly every time the hook is set. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-04-02 01:06:21 +02:00
Rafael J. Wysocki	bb6ab52f2b	intel_pstate: Do not set utilization update hook too early The utilization update hook in the intel_pstate driver is set too early, as it only should be set after the policy has been fully initialized by the core. That may cause intel_pstate_update_util() to use incorrect data and put the CPUs into incorrect P-states as a result. To prevent that from happening, make intel_pstate_set_policy() set the utilization update hook instead of intel_pstate_init_cpu() so intel_pstate_update_util() only runs when all things have been initialized as appropriate. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-31 17:42:15 +02:00
Linus Torvalds	3d66c6ba3f	Power management and ACPI material for v4.6-rc1, part 2 - Fix for an intel_pstate driver issue related to the handling of MSR updates uncovered by the recent cpufreq rework (Rafael Wysocki). - cpufreq core cleanups related to starting governors and frequency synchronization during resume from system suspend and a locking fix for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran). - acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang, Michael Neuling, Richard Cochran, Shilpasri Bhat). - intel_idle driver update preventing some Skylake-H systems from hanging during initialization by disabling deep C-states mishandled by the platform in the problematic configurations (Len Brown). - Intel Xeon Phi Processor x200 support for intel_idle (Dasaratharaman Chandramouli). - cpuidle menu governor updates to make it always honor PM QoS latency constraints (and prevent C1 from being used as the fallback C-state on x86 when they are set below its exit latency) and to restore the previous behavior to fall back to C1 if the next timer event is set far enough in the future that was changed in 4.4 which led to an energy consumption regression (Rik van Riel, Rafael Wysocki). - New device ID for a future AMD UART controller in the ACPI driver for AMD SoCs (Wang Hongcheng). - Rockchip rk3399 support for the rockchip-io-domain adaptive voltage scaling (AVS) driver (David Wu). - ACPI PCI resources management fix for the handling of IO space resources on architectures where the IO space is memory mapped (IA64 and ARM64) broken by the introduction of common ACPI resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi). - Fix for the ACPI backend of the generic device properties API to make it parse non-device (data node only) children of an ACPI device correctly (Irina Tirdea). - Fixes for the handling of global suspend flags (introduced in 4.4) during hibernation and resume from it (Lukas Wunner). - Support for obtaining configuration information from Device Trees in the PM clocks framework (Jon Hunter). - ACPI _DSM helper code and devfreq framework cleanups (Colin Ian King, Geert Uytterhoeven). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJW9JaRAAoJEILEb/54YlRx/GAQAJujANWilWHZYm24a9JDcIE9 rsNZIC/FdeBVilPtRTZQnig/Pj32Z4Jm7IZ/DLOq0Deu1YK/9uv3y59M3BcX6WyL H5VR80L8geUJZ7RRk0WfM5D4X82ovzwpE/kWt2Z7HDuvJSCBmFBZOvNrXbaRncKD jIvat/p6uCuxt5c08+ebnBLQ6tOs8wLTWiCx3fO128GIrGRGN2xFV6hzRWVGnJ4g WXGAR+AdLxRMZz4PPmqdTfRj4TNSR071GjKyaeKfZUjQGAsf5O9A77JFjeNVomDx g1K37Byid2bTByzVavlEXPJZ7eKb5dAhlo7IJ9HAcOAXChLqH2Czjrpd+1XjR9MF SV/78rCnF8eet83QYLbGV/Mzf7gbJP2Xp6wiaM22VAPpGe+sYfphJoQka9XRTfId OgAjyYMYdWAKo5DhxVNI8WyN0W5dsoBFPxnaUFhHSGDCIJH7Ksy20m6y3plG2Bxf ahoiQhmd9ohjtB5JbRnf4MY0hjekp8Srdf+DoNKsk/+JscIyROpYY3msQ3smUKo+ f628MC/wAosMpSV+l+KOYkbjCbtB49IabWtZ//NVD9hYB3E1f6aTN59yFbWB+1rp L7Y8iaxzSkyJy/yYVuBal3rSk356+BvvoXBlLXmBsyu1TMlcDjALIYztSiTVT5MB RZBhgNwdkxNCYJfU3ex+ =hUVj -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management and ACPI updates from Rafael Wysocki: "The second batch of power management and ACPI updates for v4.6. Included are fixups on top of the previous PM/ACPI pull request and other material that didn't make into it but still should go into 4.6. Among other things, there's a fix for an intel_pstate driver issue uncovered by recent cpufreq changes, a workaround for a boot hang on Skylake-H related to the handling of deep C-states by the platform and a PCI/ACPI fix for the handling of IO port resources on non-x86 architectures plus some new device IDs and similar. Specifics: - Fix for an intel_pstate driver issue related to the handling of MSR updates uncovered by the recent cpufreq rework (Rafael Wysocki). - cpufreq core cleanups related to starting governors and frequency synchronization during resume from system suspend and a locking fix for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran). - acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang, Michael Neuling, Richard Cochran, Shilpasri Bhat). - intel_idle driver update preventing some Skylake-H systems from hanging during initialization by disabling deep C-states mishandled by the platform in the problematic configurations (Len Brown). - Intel Xeon Phi Processor x200 support for intel_idle (Dasaratharaman Chandramouli). - cpuidle menu governor updates to make it always honor PM QoS latency constraints (and prevent C1 from being used as the fallback C-state on x86 when they are set below its exit latency) and to restore the previous behavior to fall back to C1 if the next timer event is set far enough in the future that was changed in 4.4 which led to an energy consumption regression (Rik van Riel, Rafael Wysocki). - New device ID for a future AMD UART controller in the ACPI driver for AMD SoCs (Wang Hongcheng). - Rockchip rk3399 support for the rockchip-io-domain adaptive voltage scaling (AVS) driver (David Wu). - ACPI PCI resources management fix for the handling of IO space resources on architectures where the IO space is memory mapped (IA64 and ARM64) broken by the introduction of common ACPI resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi). - Fix for the ACPI backend of the generic device properties API to make it parse non-device (data node only) children of an ACPI device correctly (Irina Tirdea). - Fixes for the handling of global suspend flags (introduced in 4.4) during hibernation and resume from it (Lukas Wunner). - Support for obtaining configuration information from Device Trees in the PM clocks framework (Jon Hunter). - ACPI _DSM helper code and devfreq framework cleanups (Colin Ian King, Geert Uytterhoeven)" * tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) PM / AVS: rockchip-io: add io selectors and supplies for rk3399 intel_idle: Support for Intel Xeon Phi Processor x200 Product Family intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled ACPI / PM: Runtime resume devices when waking from hibernate PM / sleep: Clear pm_suspend_global_flags upon hibernate cpufreq: governor: Always schedule work on the CPU running update cpufreq: Always update current frequency before startig governor cpufreq: Introduce cpufreq_update_current_freq() cpufreq: Introduce cpufreq_start_governor() cpufreq: powernv: Add sysfs attributes to show throttle stats cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static PCI: ACPI: IA64: fix IO port generic range check ACPI / util: cast data to u64 before shifting to fix sign extension cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path cpuidle: menu: Fall back to polling if next timer event is near cpufreq: acpi-cpufreq: Clean up hot plug notifier callback intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts cpufreq: Make cpufreq_quick_get() safe to call ACPI / property: fix data node parsing in acpi_get_next_subnode() ACPI / APD: Add device HID for future AMD UART controller ...	2016-03-24 22:59:58 -07:00
Rafael J. Wysocki	33068b61f8	Merge branches 'pm-cpufreq' and 'pm-cpuidle' * pm-cpufreq: cpufreq: governor: Always schedule work on the CPU running update cpufreq: Always update current frequency before startig governor cpufreq: Introduce cpufreq_update_current_freq() cpufreq: Introduce cpufreq_start_governor() cpufreq: powernv: Add sysfs attributes to show throttle stats cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path cpufreq: acpi-cpufreq: Clean up hot plug notifier callback intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts cpufreq: Make cpufreq_quick_get() safe to call * pm-cpuidle: intel_idle: Support for Intel Xeon Phi Processor x200 Product Family intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled cpuidle: menu: Fall back to polling if next timer event is near cpuidle: menu: use high confidence factors only when considering polling	2016-03-25 00:57:22 +01:00
Rafael J. Wysocki	539a4c4247	cpufreq: governor: Always schedule work on the CPU running update Modify dbs_irq_work() to always schedule the process-context work on the current CPU which also ran the dbs_update_util_handler() that the irq_work being handled came from. This causes the entire frequency update handling (involving the "ondemand" or "conservative" governors) to be carried out by the CPU whose frequency is to be updated and reduces the overall amount of inter-CPU noise related to cpufreq. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-22 23:13:36 +01:00
Rafael J. Wysocki	3bbf8fe3ae	cpufreq: Always update current frequency before startig governor Make policy->cur match the current frequency returned by the driver's ->get() callback before starting the governor in case they went out of sync in the meantime and drop the piece of code attempting to resync policy->cur with the real frequency of the boot CPU from cpufreq_resume() as it serves no purpose any more (and it's racy and super-ugly anyway). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-22 23:13:36 +01:00
Rafael J. Wysocki	999f572983	cpufreq: Introduce cpufreq_update_current_freq() Move the part of cpufreq_update_policy() that obtains the current frequency from the driver and updates policy->cur if necessary to a separate function, cpufreq_get_current_freq(). That should not introduce functional changes and subsequent change set will need it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-22 23:13:36 +01:00
Rafael J. Wysocki	0a300767e5	cpufreq: Introduce cpufreq_start_governor() Starting a governor in cpufreq always follows the same pattern involving two calls to cpufreq_governor(), one with the event argument set to CPUFREQ_GOV_START and one with that argument set to CPUFREQ_GOV_LIMITS. Introduce cpufreq_start_governor() that will carry out those two operations and make all places where governors are started use it. That slightly modifies the behavior of cpufreq_set_policy() which now also will go back to the old governor if the second call to cpufreq_governor() (the one with event equal to CPUFREQ_GOV_LIMITS) fails, but that really is how it should work in the first place. Also cpufreq_resume() will now pring an error message if the CPUFREQ_GOV_LIMITS call to cpufreq_governor() fails, but that makes it follow cpufreq_add_policy_cpu() and cpufreq_offline() in that respect. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-22 23:13:36 +01:00
Shilpasri G Bhat	1b0289848d	cpufreq: powernv: Add sysfs attributes to show throttle stats Create sysfs attributes to export throttle information in /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats directory. The newly added sysfs files are as follows: 1)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat 2)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub-turbo_stat 3)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/unthrottle 4)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/powercap 5)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overtemp 6)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/supply_fault 7)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overcurrent 8)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/occ_reset Detailed explanation of each attribute is added to Documentation/ABI/testing/sysfs-devices-system-cpu Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-22 23:11:18 +01:00
Jisheng Zhang	ac13b9967d	cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static These frequency register read/write operations' implementations for the given processor (Intel/AMD MSR access or I/O port access) are only used internally in acpi-cpufreq, so make them static. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-22 23:09:50 +01:00
Michael Neuling	3e5963bc34	cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path Commit `96c4726f01` "cpufreq: powernv: Remove cpu_to_chip_id() from hot-path" introduced a 'core_to_chip_map' array to cache the chip-ids of all cores. Replace this with a per-CPU variable that stores the pointer to the chip-array. This removes the linear lookup and provides a neater and simpler solution. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-22 01:08:55 +01:00
Linus Torvalds	46e595a17d	ARM: SoC driver updates for v4.6 Driver updates for ARM SoCs, these contain various things that touch the drivers/ directory but got merged through arm-soc for practical reasons: - Rockchip rk3368 gains power domain support - Small updates for the ARM spmi driver - The Atmel PMC driver saw a larger rework, touching both arch/arm/mach-at91 and drivers/clk/at91 - All reset controller driver changes alway get merged through arm-soc, though this time the largest change is the addition of a MIPS pistachio reset driver - One bugfix for the NXP (formerly Freescale) i.MX weim bus driver -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVu67OmCrR//JCVInAQJ64hAAqNemdAMloJhh8mk4O74egd/XNE8GLK3v gGefpZNi0TC8u/GWMhU1aFCElaCmbNlL0IlqaRrU/vydOmQcZYht7Fg3bAm4r3ck TlKijGTJap4sdHhxSeui+7bhaBToxcklQTdcrKFgOwsype7CAWJCl5otIC/GHO5L fn4QSjQbqr5kqH1XfuVIphj/fJjDKRRze5D7zn0nExq46OyoYyjc2lm/QkLgeeS2 vDpzOULYXcjf5GfsPknCJGGjenISD7cIAwZukGvJXFh8WrXkEPZZ7B7bBI/8ZeBU MkdWvOm9fHEWpIPnuTcLeQNlfdzQ0Z0zijgJqnXjwSYXK2Es1UKEoIFvZUyGA9zG uyLtddFcKbP4QBDUKVMbyYM6x4Cj7LO96dB2pe8iH5rvnoLS32EjJ/4glnbPQFB7 75JKb7eU1pijoy9c3x/G10vINHzbPjyUN3sYTFKMomPFzEF4OVQ3GDclSuD7jjDr GnqmAqlj29+qGU6iQBBHp9TfLTxwrs/4MKPEZ+tTGvtINnzOpLGA3TUnji7nVFQc BYy3qaEvg9MfHI3uXhAl2L4CGCVvHfqFs5B7giZfAkbbcTNAHs9PkZ6gMYH+GG3p tEbTf/dMHmkkqttSz4f7LZS7D56cSfm3cD8kFCRJPLKifmGAk3w1HZ7JoCXdjr1K 22HSKRMxlhU= =HS4G -----END PGP SIGNATURE----- Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Arnd Bergmann: "Driver updates for ARM SoCs, these contain various things that touch the drivers/ directory but got merged through arm-soc for practical reasons: - Rockchip rk3368 gains power domain support - Small updates for the ARM spmi driver - The Atmel PMC driver saw a larger rework, touching both arch/arm/mach-at91 and drivers/clk/at91 - All reset controller driver changes alway get merged through arm-soc, though this time the largest change is the addition of a MIPS pistachio reset driver - One bugfix for the NXP (formerly Freescale) i.MX weim bus driver" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits) bus: imx-weim: Take the 'status' property value into account clk: at91: remove useless includes clk: at91: pmc: remove useless capacities handling clk: at91: pmc: drop at91_pmc_base usb: gadget: atmel: access the PMC using regmap ARM: at91: remove useless includes and function prototypes ARM: at91: pm: move idle functions to pm.c ARM: at91: pm: find and remap the pmc ARM: at91: pm: simply call at91_pm_init clk: at91: pmc: move pmc structures to C file clk: at91: pmc: merge at91_pmc_init in atmel_pmc_probe clk: at91: remove IRQ handling and use polling clk: at91: make use of syscon/regmap internally clk: at91: make use of syscon to share PMC registers in several drivers hwmon: (scpi) add energy meter support firmware: arm_scpi: add support for 64-bit sensor values firmware: arm_scpi: decrease Tx timeout to 20ms firmware: arm_scpi: fix send_message and sensor_get_value for big-endian reset: sti: Make reset_control_ops const reset: zynq: Make reset_control_ops const ...	2016-03-20 15:40:32 -07:00
Linus Torvalds	33b3d2e88c	ARM: SoC platform updates for v4.6 Newly added support for additional SoCs: - Axis Artpec-6 SoC family - Allwinner A83T SoC - Mediatek MT7623 - NXP i.MX6QP SoC - ST Microelectronics stm32f469 microcontroller New features: - SMP support for Mediatek mt2701 - Big-endian support for NXP i.MX - DaVinci now uses the new DMA engine dma_slave_map - OMAP now uses the new DMA engine dma_slave_map - earlyprintk support for palmchip uart on mach-tango - delay timer support for orion Other: - Exynos PMU driver moved out to drivers/soc/ - Various smaller updates for Renesas, Xilinx, PXA, AT91, OMAP, uniphier -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVu68DGCrR//JCVInAQIHVQ//Wblms+NKj3aKh6m2Sscs/YkSbFaQ4sY2 rNyfxLIYsLXkth1kbdHRFSMyL68Ym+xutErgw/3HQPB2D1YtYJE3VJ/y8AU92SU3 oHyQIty+atB8d8zBbtlkWmat94NIfYf0I8PQETreGb1LMaJqAf0mDEDAyorTLZcZ UtQ817Ihn7urqwdTJpTO58V41RmY/vflbHI5T6bIjUJn6fF1e/7+VqtMIfq5sjJ6 0EPEQdu8s5AJ7gcGlGi9I5gAtSnWSA/9phAxul9P8/HrMpUWIxreSEAy8FY7W14F 4TON3sQrnw7nyA72U80KGIXhgLy7SbEmHcSqyy4YJK3ycdk6VYk0CBO7nWVYAiD1 knLisOH6jwe0LIj9WXiRR+Y2Q53pXN8SF77pLDahSnvuShnYEjEH5uELHtxe7Vxh gn+NH1rDkRTgdYgt4RWlVyUoLkddQWzLb1m4QyQlvxtTR25cJJayXdVX2MRrNPF5 c1zRa9HH+b8LJQIMdWfo/NoHhHtftkkGGsqHAAaypZqdpyk0j2HpJYk5ecPR4f5C /8o/h/5xOI9gEzp/DVYSZ1VAvRqBQGIDfKBXWq6GuoZaF0aN8ISe5IxFn5Yx2F46 fNaxqiNpWmyywl8D+tSWPFK6aE21AXKGi5zIzexZZqy283aDjlUPI+tgF2GKIuKP 3ayYTDeBpLI= =ynNj -----END PGP SIGNATURE----- Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform updates from Arnd Bergmann: "Newly added support for additional SoCs: - Axis Artpec-6 SoC family - Allwinner A83T SoC - Mediatek MT7623 - NXP i.MX6QP SoC - ST Microelectronics stm32f469 microcontroller New features: - SMP support for Mediatek mt2701 - Big-endian support for NXP i.MX - DaVinci now uses the new DMA engine dma_slave_map - OMAP now uses the new DMA engine dma_slave_map - earlyprintk support for palmchip uart on mach-tango - delay timer support for orion Other: - Exynos PMU driver moved out to drivers/soc/ - Various smaller updates for Renesas, Xilinx, PXA, AT91, OMAP, uniphier" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (83 commits) ARM: uniphier: rework SMP code to support new System Bus binding ARM: uniphier: add missing of_node_put() ARM: at91: avoid defining CONFIG_* symbols in source code ARM: DRA7: hwmod: Add data for eDMA tpcc, tptc0, tptc1 ARM: imx: Make reset_control_ops const ARM: imx: Do L2 errata only if the L2 cache isn't enabled ARM: imx: select ARM_CPU_SUSPEND only for imx6 dmaengine: pxa_dma: fix the maximum requestor line ARM: alpine: select the Alpine MSI controller driver ARM: pxa: add the number of DMA requestor lines dmaengine: mmp-pdma: add number of requestors dma: mmp_pdma: Add the #dma-requests DT property documentation ARM: OMAP2+: Add rtc hwmod configuration for ti81xx ARM: s3c24xx: Avoid warning for inb/outb ARM: zynq: Move early printk virtual address to vmalloc area ARM: DRA7: hwmod: Add custom reset handler for PCIeSS ARM: SAMSUNG: Remove unused register offset definition ARM: EXYNOS: Cleanup header files inclusion drivers: soc: samsung: Enable COMPILE_TEST MAINTAINERS: Add maintainers entry for drivers/soc/samsung ...	2016-03-20 14:57:08 -07:00
Richard Cochran	ed72662a84	cpufreq: acpi-cpufreq: Clean up hot plug notifier callback This driver has two issues. First, it tries to fiddle with the hot plugged CPU's MSR on the UP_PREPARE event, at a time when the CPU is not yet online. Second, the driver sets the "boost-disable" bit for a CPU when going down, but does not clear the bit again if the CPU comes up again due to DOWN_FAILED. This patch fixes the issues by changing the driver to react to the ONLINE/DOWN_FAILED events instead of UP_PREPARE. As an added benefit, the driver also becomes symmetric with respect to the hot plug mechanism. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-20 00:39:04 +01:00
Rafael J. Wysocki	fdfdb2b130	intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts After commit `a4675fbc4a` (cpufreq: intel_pstate: Replace timers with utilization update callbacks) wrmsrl_on_cpu() cannot be called in the intel_pstate_adjust_busy_pstate() path as that is executed with disabled interrupts. However, atom_set_pstate() called from there via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in smp_call_function_single(). The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is because intel_pstate_set_pstate() calling it is also invoked during the initialization and cleanup of the driver and in those cases it is not guaranteed to be run on the CPU that is being updated. However, in the case when intel_pstate_set_pstate() is called by intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update the register safely. Moreover, intel_pstate_set_pstate() already contains code that only is executed if the function is called by intel_pstate_adjust_busy_pstate() and there is a special argument passed to it because of that. To fix the problem at hand, rearrange the code taking the above observations into account. First, replace the ->set() callback in struct pstate_funcs with a ->get_val() one that will return the value to be written to the IA32_PERF_CTL MSR without updating the register. Second, split intel_pstate_set_pstate() into two functions, intel_pstate_update_pstate() to be called by intel_pstate_adjust_busy_pstate() that will contain all of the intel_pstate_set_pstate() code which only needs to be executed in that case and will use wrmsrl() to update the MSR (after obtaining the value to write to it from the ->get_val() callback), and intel_pstate_set_min_pstate() to be invoked during the initialization and cleanup that will set the P-state to the minimum one and will update the MSR using wrmsrl_on_cpu(). Finally, move the code shared between intel_pstate_update_pstate() and intel_pstate_set_min_pstate() to a new static inline function intel_pstate_record_pstate() and make them both call it. Of course, that unifies the handling of the IA32_PERF_CTL MSR writes between Atom and Core. Fixes: `a4675fbc4a` (cpufreq: intel_pstate: Replace timers with utilization update callbacks) Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-20 00:37:09 +01:00
Richard Cochran	c75361c0b0	cpufreq: Make cpufreq_quick_get() safe to call The function, cpufreq_quick_get, accesses the global 'cpufreq_driver' and its fields without taking the associated lock, cpufreq_driver_lock. Without the locking, nothing guarantees that 'cpufreq_driver' remains consistent during the call. This patch fixes the issue by taking the lock before accessing the data structure. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-18 01:49:01 +01:00
Rafael J. Wysocki	4ed3900427	Merge branch 'pm-cpufreq' * pm-cpufreq: (94 commits) intel_pstate: Do not skip samples partially intel_pstate: Remove freq calculation from intel_pstate_calc_busy() intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance() intel_pstate: Optimize calculation for max/min_perf_adj intel_pstate: Remove extra conversions in pid calculation cpufreq: Move scheduler-related code to the sched directory Revert "cpufreq: postfix policy directory with the first CPU in related_cpus" cpufreq: Reduce cpufreq_update_util() overhead a bit cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set cpufreq: Remove 'policy->governor_enabled' cpufreq: Rename __cpufreq_governor() to cpufreq_governor() cpufreq: Relocate handle_update() to kill its declaration cpufreq: governor: Drop unnecessary checks from show() and store() cpufreq: governor: Fix race in dbs_update_util_handler() cpufreq: governor: Make gov_set_update_util() static cpufreq: governor: Narrow down the dbs_data_mutex coverage cpufreq: governor: Make dbs_data_mutex static cpufreq: governor: Relocate definitions of tuners structures cpufreq: governor: Move per-CPU data to the common code cpufreq: governor: Make governor private data per-policy ...	2016-03-14 14:22:03 +01:00
Rafael J. Wysocki	4fec7ad5f6	intel_pstate: Do not skip samples partially If the current value of MPERF or the current value of TSC is the same as the previous one, respectively, intel_pstate_sample() bails out early and skips the sample. However, intel_pstate_adjust_busy_pstate() is still called in that case which is not correct, so modify intel_pstate_sample() to return a bool value indicating whether or not the sample has been taken and use it to decide whether or not to call intel_pstate_adjust_busy_pstate(). While at it, remove redundant parentheses from the MPERF/TSC check in intel_pstate_sample(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2016-03-11 00:07:51 +01:00
Philippe Longepe	8fa520af50	intel_pstate: Remove freq calculation from intel_pstate_calc_busy() Use a helper function to compute the average pstate and call it only where it is needed (only when tracing or in intel_pstate_get). Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-11 00:04:58 +01:00
Philippe Longepe	7349ec0470	intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance() The cpu_load algorithm doesn't need to invoke intel_pstate_calc_busy(), so move that call from intel_pstate_sample() to get_target_pstate_use_performance(). Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-11 00:04:58 +01:00
Philippe Longepe	a158bed5dc	intel_pstate: Optimize calculation for max/min_perf_adj mul_fp(int_tofp(A), B) expands to: ((A << FRAC_BITS) * B) >> FRAC_BITS, so the same result can be obtained via simple multiplication A * B. Apply this observation to max_perf * limits->max_perf and max_perf * limits->min_perf in intel_pstate_get_min_max()." Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-11 00:04:58 +01:00
Philippe Longepe	b54a0dfd56	intel_pstate: Remove extra conversions in pid calculation pid->setpoint and pid->deadband can be initialized in fixed point, so we can avoid the int_tofp in pid_calc. Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-11 00:04:42 +01:00
Rafael J. Wysocki	a5acbfbd70	Merge branch 'pm-cpufreq-governor' into pm-cpufreq	2016-03-10 20:46:03 +01:00
Rafael J. Wysocki	adaf9fcd13	cpufreq: Move scheduler-related code to the sched directory Create cpufreq.c under kernel/sched/ and move the cpufreq code related to the scheduler to that file and to sched.h. Redefine cpufreq_update_util() as a static inline function to avoid function calls at its call sites in the scheduler code (as suggested by Peter Zijlstra). Also move the definition of struct update_util_data and declaration of cpufreq_set_update_util_data() from include/linux/cpufreq.h to include/linux/sched.h. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2016-03-10 20:44:47 +01:00
Viresh Kumar	edd4a893e0	Revert "cpufreq: postfix policy directory with the first CPU in related_cpus" Revert commit `3510fac454` (cpufreq: postfix policy directory with the first CPU in related_cpus). Earlier, the policy->kobj was added to the kobject core, before ->init() callback was called for the cpufreq drivers. Which allowed those drivers to add or remove, driver dependent, sysfs files/directories to the same kobj from their ->init() and ->exit() callbacks. That isn't possible anymore after commit `3510fac454`. Now, there is no other clean alternative that people can adopt. Its better to revert the earlier commit to allow cpufreq drivers to create/remove sysfs files from ->init() and ->exit() callbacks. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 21:42:45 +01:00
Rafael J. Wysocki	08f511fd41	cpufreq: Reduce cpufreq_update_util() overhead a bit Use the observation that cpufreq_update_util() is only called by the scheduler with rq->lock held, so the callers of cpufreq_set_update_util_data() can use synchronize_sched() instead of synchronize_rcu() to wait for cpufreq_update_util() to complete. Moreover, if they are updated to do that, rcu_read_(un)lock() calls in cpufreq_update_util() might be replaced with rcu_read_(un)lock_sched(), respectively, but those aren't really necessary, because the scheduler calls that function from RCU-sched read-side critical sections already. In addition to that, if cpufreq_set_update_util_data() checks the func field in the struct update_util_data before setting the per-CPU pointer to it, the data->func check may be dropped from cpufreq_update_util() as well. Make the above changes to reduce the overhead from cpufreq_update_util() in the scheduler paths invoking it and to make the cleanup after removing its callbacks less heavy-weight somewhat. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2016-03-09 15:07:58 +01:00
Rafael J. Wysocki	e6f036571e	cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set Commit 0eb463be3436 (cpufreq: governor: Replace timers with utilization update callbacks) made CPU_FREQ select IRQ_WORK, but that's not necessary, as it is sufficient for IRQ_WORK to be selected by CPU_FREQ_GOV_COMMON, so modify the cpufreq Kconfig to that effect. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 15:01:37 +01:00
Viresh Kumar	242aa883a6	cpufreq: Remove 'policy->governor_enabled' The entire sequence of events (like INIT/START or STOP/EXIT) for which cpufreq_governor() is called, is guaranteed to be protected by policy->rwsem now. The additional checks that were added earlier (as we were forced to drop policy->rwsem before calling cpufreq_governor() for EXIT event), aren't required anymore. Over that, they weren't sufficient really. They just take care of START/STOP events, but not INIT/EXIT and the state machine was never maintained properly by them. Kill the unnecessary checks and policy->governor_enabled field. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:12 +01:00
Viresh Kumar	a1317e091a	cpufreq: Rename __cpufreq_governor() to cpufreq_governor() The __ at the beginning of the routine aren't really necessary at all. Rename it to cpufreq_governor() instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:11 +01:00
Viresh Kumar	11eb69b984	cpufreq: Relocate handle_update() to kill its declaration handle_update() is declared at the top of the file as its user appear before its definition. Relocate the routine to get rid of this. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:11 +01:00
Viresh Kumar	f737236b12	cpufreq: governor: Drop unnecessary checks from show() and store() The show() and store() routines in the cpufreq-governor core don't need to check if the struct governor_attr they want to use really provides the callbacks they need as expected (if that's not the case, it means a bug in the code anyway), so change them to avoid doing that. Also change the error value to -EBUSY, if the governor is getting removed and we aren't allowed to store any more changes. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:11 +01:00
Rafael J. Wysocki	27de348239	cpufreq: governor: Fix race in dbs_update_util_handler() There is a scenario that may lead to undesired results in dbs_update_util_handler(). Namely, if two CPUs sharing a policy enter the funtion at the same time, pass the sample delay check and then one of them is stalled until dbs_work_handler() (queued up by the other CPU) clears the work counter, it may update the work counter and queue up another work item prematurely. To prevent that from happening, use the observation that the CPU queuing up a work item in dbs_update_util_handler() updates the last sample time. This means that if another CPU was stalling after passing the sample delay check and now successfully updated the work counter as a result of the race described above, it will see the new value of the last sample time which is different from what it used in the sample delay check before. If that happens, the sample delay check passed previously is not valid any more, so the CPU should not continue. Fixes: f17cbb53783c (cpufreq: governor: Avoid atomic operations in hot paths) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:10 +01:00
Rafael J. Wysocki	94ab5e030f	cpufreq: governor: Make gov_set_update_util() static The gov_set_update_util() routine is only used internally by the common governor code and it doesn't need to be exported, so make it static. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:10 +01:00
Rafael J. Wysocki	1112e9d83e	cpufreq: governor: Narrow down the dbs_data_mutex coverage Since cpufreq_governor_dbs() is now always called with policy->rwsem held, it cannot be executed twice in parallel for the same policy. Thus it is not necessary to hold dbs_data_mutex around the invocations of cpufreq_governor_start/stop/limits() from it as those functions never modify any data that can be shared between different policies. However, cpufreq_governor_dbs() may be executed twice in parallal for different policies using the same gov->gdbs_data object and dbs_data_mutex is still necessary to protect that object against concurrent updates. For this reason, narrow down the dbs_data_mutex locking to cpufreq_governor_init/exit() where it is needed and rename the mutex to gov_dbs_data_mutex to reflect its purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:10 +01:00
Rafael J. Wysocki	e3f5ed9393	cpufreq: governor: Make dbs_data_mutex static That mutex is only used by cpufreq_governor_dbs() and it doesn't need to be exported to modules, so make it static and drop the export incantation. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:09 +01:00
Rafael J. Wysocki	47ebaac1f3	cpufreq: governor: Relocate definitions of tuners structures Move the definitions of struct od_dbs_tuners and struct cs_dbs_tuners from the common governor header to the ondemand and conservative governor code, respectively, as they don't need to be in the common header any more. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:09 +01:00
Rafael J. Wysocki	8c8f77fd07	cpufreq: governor: Move per-CPU data to the common code After previous changes there is only one piece of code in the ondemand governor making references to per-CPU data structures, but it can be easily modified to avoid doing that, so modify it accordingly and move the definition of per-CPU data used by the ondemand and conservative governors to the common code. Next, change that code to access the per-CPU data structures directly rather than via a governor callback. This causes the ->get_cpu_cdbs governor callback to become unnecessary, so drop it along with the macro and function definitions related to it. Finally, drop the definitions of struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s that aren't necessary any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:09 +01:00
Rafael J. Wysocki	7d5a9956af	cpufreq: governor: Make governor private data per-policy Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s are only used for a limited set of CPUs. Namely, if a policy is shared between multiple CPUs, those fields will only be used for one of them (policy->cpu). This means that they really are per-policy rather than per-CPU and holding room for them in per-CPU data structures is generally wasteful. Also moving those fields into per-policy data structures will allow some significant simplifications to be made going forward. For this reason, introduce struct cs_policy_dbs_info and struct od_policy_dbs_info to hold those fields. Define each of the new structures as an extension of struct policy_dbs_info (such that struct policy_dbs_info is embedded in each of them) and introduce new ->alloc and ->free governor callbacks to allocate and free those structures, respectively, such that ->alloc() will return a pointer to the struct policy_dbs_info embedded in the allocated data structure and ->free() will take that pointer as its argument. With that, modify the code accessing the data fields in question in per-CPU data objects to look for them in the new structures via the struct policy_dbs_info pointer available to it and drop them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:08 +01:00
Rafael J. Wysocki	d1db75fffc	cpufreq: ondemand: Rework the handling of powersave bias updates The ondemand_powersave_bias_init() function used for resetting data fields related to the powersave bias tunable of the ondemand governor works by walking all of the online CPUs in the system and updating the od_cpu_dbs_info_s structures for all of them. However, if governor tunables are per policy, the update should not touch the CPUs that are not associated with the given dbs_data. Moreover, since the data fields in question are only ever used for policy->cpu in each policy governed by ondemand, the update can be limited to those specific CPUs. Rework the code to take the above observations into account. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:08 +01:00
Rafael J. Wysocki	a33cce1c6c	cpufreq: governor: Fix CPU load information updates via ->store The ->store() callbacks of some tunable sysfs attributes of the ondemand and conservative governors trigger immediate updates of the CPU load information for all CPUs "governed" by the given dbs_data by walking the cpu_dbs_info structures for all online CPUs in the system and updating them. This is questionable for two reasons. First, it may lead to a lot of extra overhead on a system with many CPUs if the given dbs_data is only associated with a few of them. Second, if governor tunables are per-policy, the CPUs associated with the other sets of governor tunables should not be updated. To address this issue, use the observation that in all of the places in question the update operation may be carried out in the same way (because all of the tunables involved are now located in struct dbs_data and readily available to the common code) and make the code in those places invoke the same (new) helper function that will carry out the update correctly. That new function always checks the ignore_nice_load tunable value and updates the CPUs' prev_cpu_nice data fields if that's set, which wasn't done by the original code in store_io_is_busy(), but it should have been done in there too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:08 +01:00
Rafael J. Wysocki	76c5f66aa1	cpufreq: ondemand: Drop one more callback from struct od_ops The ->powersave_bias_init_cpu callback in struct od_ops is only used in one place and that invocation may be replaced with a direct call to the function pointed to by that callback, so change the code accordingly and drop the callback. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:07 +01:00
Rafael J. Wysocki	8434dadbb4	cpufreq: governor: Drop unused governor callback and data fields After some previous changes, the ->get_cpu_dbs_info_s governor callback and the "governor" field in struct dbs_governor (whose value represents the governor type) are not used any more, so drop them. Also drop the unused gov_ops field from struct dbs_governor. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:07 +01:00
Rafael J. Wysocki	702c9e542a	cpufreq: governor: Add a ->start callback for governors To avoid having to check the governor type explicitly in the common code in order to initialize data structures specific to the governor type properly, add a ->start callback to struct dbs_governor and use it to initialize those data structures for the ondemand and conservative governors. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:07 +01:00
Rafael J. Wysocki	8847e038c1	cpufreq: governor: Move io_is_busy to struct dbs_data The io_is_busy governor tunable is only used by the ondemand governor and is located in the ondemand-specific data structure, but it is looked at by the common governor code that has to do ugly things to get to that value, so move it to struct dbs_data and modify ondemand accordingly. Since the conservative governor never touches that field, it will be always 0 for that governor and it won't have any effect on the results of computations in that case. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:06 +01:00
Rafael J. Wysocki	574ef14d5d	cpufreq: governor: Close dbs_data update race condition It is possible for a dbs_data object to be updated after its usage counter has become 0. That may happen if governor_store() runs (via a govenor tunable sysfs attribute write) in parallel with cpufreq_governor_exit() called for the last cpufreq policy associated with the dbs_data in question. In that case, if governor_store() acquires dbs_data->mutex right after cpufreq_governor_exit() has released it, the ->store() callback invoked by it may operate on dbs_data with no users. Although sysfs will cause the kobject_put() in cpufreq_governor_exit() to block until governor_store() has returned, that situation may lead to some unexpected results, depending on the implementation of the ->store callback, and therefore it should be avoided. To that end, modify governor_store() to check the dbs_data's usage count before invoking the ->store() callback and return an error if it is 0 at that point. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:06 +01:00
Rafael J. Wysocki	8eb055d3f5	cpufreq: ondemand: Drop unused callback from struct od_ops The ->freq_increase callback in struct od_ops is never invoked, so drop it. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:06 +01:00
Rafael J. Wysocki	a7f35cffb9	cpufreq: ondemand: Simplify od_update() slightly Drop some lines of code from od_update() by arranging the statements in there in a more logical way. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:05 +01:00
Rafael J. Wysocki	07aa4402a0	cpufreq: governor: Use microseconds in sample delay computations Do not convert microseconds to jiffies and the other way around in governor computations related to the sampling rate and sample delay and drop delay_for_sampling_rate() which isn't of any use then. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:05 +01:00
Rafael J. Wysocki	6e96c5b3ac	cpufreq: ondemand: Simplify conditionals in od_dbs_timer() Reduce the indentation level in the conditionals in od_dbs_timer() and drop the delay variable from it. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:05 +01:00
Rafael J. Wysocki	57dc3bcd45	cpufreq: governor: Move rate_mult to struct policy_dbs The rate_mult field in struct od_cpu_dbs_info_s is used by the code shared with the conservative governor and to access it that code has to do an ugly governor type check. However, first of all it is ever only used for policy->cpu, so it is per-policy rather than per-CPU and second, it is initialized to 1 by cpufreq_governor_start(), so if the conservative governor never modifies it, it will have no effect on the results of any computations. For these reasons, move rate_mult to struct policy_dbs_info (as a common field). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:04 +01:00
Rafael J. Wysocki	78347cdb89	cpufreq: governor: Reset sample delay in store_sampling_rate() If store_sampling_rate() updates the sample delay when the ondemand governor is in the middle of its high/low dance (OD_SUB_SAMPLE sample type is set), the governor will still do the bottom half of the previous sample which may take too much time. To prevent that from happening, change store_sampling_rate() to always reset the sample delay to 0 which also is consistent with the new behavior of cpufreq_governor_limits(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:04 +01:00
Rafael J. Wysocki	4cccf75557	cpufreq: governor: Get rid of the ->gov_check_cpu callback The way the ->gov_check_cpu governor callback is used by the ondemand and conservative governors is not really straightforward. Namely, the governor calls dbs_check_cpu() that updates the load information for the policy and the invokes ->gov_check_cpu() for the governor. To get rid of that entanglement, notice that cpufreq_governor_limits() doesn't need to call dbs_check_cpu() directly. Instead, it can simply reset the sample delay to 0 which will cause a sample to be taken immediately. The result of that is practically equivalent to calling dbs_check_cpu() except that it will trigger a full update of governor internal state and not just the ->gov_check_cpu() part. Following that observation, make cpufreq_governor_limits() reset the sample delay and turn dbs_check_cpu() into a function that will simply evaluate the load and return the result called dbs_update(). That function can now be called by governors from the routines that previously were pointed to by ->gov_check_cpu and those routines can be called directly by each governor instead of dbs_check_cpu(). This way ->gov_check_cpu becomes unnecessary, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:04 +01:00
Rafael J. Wysocki	57eb832f90	cpufreq: governor: Clean up load-related computations Clean up some load-related computations in dbs_check_cpu() and cpufreq_governor_start() to get rid of unnecessary operations and type casts and make the code easier to read. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:03 +01:00
Rafael J. Wysocki	679b8fe43a	cpufreq: governor: Fix nice contribution computation in dbs_check_cpu() The contribution of the CPU nice time to the idle time in dbs_check_cpu() is computed in a bogus way, as the code may subtract current and previous nice values for different CPUs. That doesn't matter for cases when cpufreq policies are not shared, but may lead to problems otherwise. Fix the computation and simplify it to avoid taking unnecessary steps. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:03 +01:00
Rafael J. Wysocki	e4db2813d2	cpufreq: governor: Avoid atomic operations in hot paths Rework the handling of work items by dbs_update_util_handler() and dbs_work_handler() so the former (which is executed in scheduler paths) only uses atomic operations when absolutely necessary. That is, when the policy is shared and dbs_update_util_handler() has already decided that this is the time to queue up a work item. In particular, this avoids the atomic ops entirely on platforms where policy objects are never shared. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:03 +01:00
Rafael J. Wysocki	f62b93740c	cpufreq: governor: Simplify gov_cancel_work() slightly The atomic work counter incrementation in gov_cancel_work() is not necessary any more, because work items won't be queued up after gov_clear_update_util() anyway, so drop it along with the comment about how it may be missed by the gov_clear_update_util(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:02 +01:00
Rafael J. Wysocki	b9db42730a	cpufreq: governor: Avoid irq_work_queue_on() crash on non-SMP ARM As it turns out, irq_work_queue_on() will crash if invoked on non-SMP ARM platforms, but in fact it is not necessary to use that function in the cpufreq governor code (as it doesn't matter to that code which CPU will handle the irq_work), so change it to always use irq_work_queue(). Fixes: 8fb47ff100af (cpufreq: governor: Replace timers with utilization update callbacks) Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Reported-and-tested-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:02 +01:00
Viresh Kumar	a23d6d1809	cpufreq: ondemand: Rearrange od_dbs_timer() to avoid updating delay Avoid extra checks in od_dbs_timer() by rearranging updates to the local delay variable in it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:02 +01:00
Viresh Kumar	aded387b94	cpufreq: conservative: Update sample_delay_ns immediately The ondemand governor already updates sample_delay_ns immediately on updates to the sampling rate, but conservative doesn't do that. It was left out earlier as the code was really too complex to get that done easily. Things are sorted out very well now, however, and the conservative governor can be modified to follow ondemand in that respect. Moreover, since the code needed to implement that in the conservative governor would be identical to the corresponding ondemand governor's code, make that code common and change both governors to use it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:01 +01:00
Viresh Kumar	581c214b21	cpufreq: governor: No need to manage state machine now The cpufreq core now guarantees that policy->rwsem won't be dropped while running the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT event and will be held acquired until the complete sequence of governor state changes has finished. This allows governor state machine checks to be dropped from multiple functions in cpufreq_governor.c. This also means that policy_dbs->policy can be initialized upfront, so the entire initialization of struct policy_dbs can be carried out in one place. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:01 +01:00
Viresh Kumar	99522fe678	cpufreq: Remove cpufreq_governor_lock We used to drop policy->rwsem just before calling __cpufreq_governor() in some cases earlier and so it was possible that __cpufreq_governor() ran concurrently via separate threads for the same policy. In order to guarantee valid state transitions for governors, 'governor_enabled' was required to be protected using some locking and cpufreq_governor_lock was added for that. But now __cpufreq_governor() is always called under policy->rwsem, and 'governor_enabled' is protected against races even without cpufreq_governor_lock. Get rid of the extra lock now. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw : Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:01 +01:00
Viresh Kumar	49f18560f8	cpufreq: Call __cpufreq_governor() with policy->rwsem held The cpufreq core code is not consistent with respect to invoking __cpufreq_governor() under policy->rwsem. Changing all code to always hold policy->rwsem around __cpufreq_governor() invocations will allow us to remove cpufreq_governor_lock that is used today because we can't guarantee that __cpufreq_governor() isn't executed twice in parallel for the same policy. We should also ensure that policy->rwsem is held across governor state changes. For example, while adding a CPU to the policy in the CPU online path, we need to stop the governor, change policy->cpus, start the governor and then refresh its limits. The complete sequence must be guaranteed to complete without interruptions by concurrent governor state updates. That can be achieved by holding policy->rwsem around those sequences of operations. Also note that after this patch cpufreq_driver->stop_cpu() and ->exit() will get called under policy->rwsem which wasn't the case earlier. That shouldn't have any side effects, though. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:00 +01:00
Viresh Kumar	69cee7147b	cpufreq: Merge cpufreq_offline_prepare/finish routines Commit `1aee40ac9c` (cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock) split the cpufreq's CPU offline routine in two pieces, one of them to be run with CPU offline/online locked and the other to be called later. The reason for that split was a possible deadlock scenario involving cpufreq sysfs attributes and CPU offline. However, the handling of CPU offline in cpufreq has changed since then. Policy sysfs attributes are never removed during CPU offline, so there's no need to worry about accessing them during CPU offline, because that can't lead to any deadlocks now. Governor sysfs attributes are still removed in __cpufreq_governor(_EXIT), but there is a new kobject type for them now and its show/store callbacks don't lock CPU offline/online (they don't need to do that). This means that the CPU offline code in cpufreq doesn't need to be split any more, so combine cpufreq_offline_prepare() with cpufreq_offline_finish(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:00 +01:00
Viresh Kumar	c54df07184	cpufreq: governor: Create and traverse list of policy_dbs to avoid deadlock The dbs_data_mutex lock is currently used in two places. First, cpufreq_governor_dbs() uses it to guarantee mutual exclusion between invocations of governor operations from the core. Second, it is used by ondemand governor's update_sampling_rate() to ensure the stability of data structures walked by it. The second usage is quite problematic, because update_sampling_rate() is called from a governor sysfs attribute's ->store callback and that leads to a deadlock scenario involving cpufreq_governor_exit() which runs under dbs_data_mutex. Thus it is better to rework the code so update_sampling_rate() doesn't need to acquire dbs_data_mutex. To that end, rework update_sampling_rate() to walk a list of policy_dbs objects supported by the dbs_data one it has been called for (instead of walking cpu_dbs_info object for all CPUs). The list manipulation is protected with dbs_data->mutex which also is held around the execution of update_sampling_rate(), it is not necessary to hold dbs_data_mutex in that function any more. Reported-by: Juri Lelli <juri.lelli@arm.com> Reported-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:59 +01:00
Viresh Kumar	68e80dae09	Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT" Earlier, when the struct freq-attr was used to represent governor attributes, the standard cpufreq show/store sysfs attribute callbacks were applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That could have resulted in an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely). We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. The previous commit, "cpufreq: governor: New sysfs show/store callbacks for governor tunables", fixed the original ABBA deadlock by adding new governor specific show/store callbacks. We don't have to drop rwsem around invocations of governor event CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now. Fixes: `955ef48335` (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reported-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:59 +01:00
Viresh Kumar	fd8ddc482a	cpufreq: governor: Drop unused macros for creating governor tunable attributes The previous commit introduced a new set of macros for creating sysfs attributes that represent governor tunables and the old macros used for this purpose are not needed any more, so drop them. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:58 +01:00

1 2 3 4 5 ...

1949 Commits