linux

Author	SHA1	Message	Date
Vitaly Kuznetsov	0e4d394fe5	xen/x86: Don't BUG on CPU0 offlining CONFIG_BOOTPARAM_HOTPLUG_CPU0 allows to offline CPU0 but Xen HVM guests BUG() in xen_teardown_timer(). Remove the BUG_ON(), this is probably a leftover from ancient times when CPU0 hotplug was impossible, it works just fine for HVM. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2017-07-23 08:09:24 +02:00
Boris Ostrovsky	d162809f85	xen/x86: Do not call xen_init_time_ops() until shared_info is initialized Routines that are set by xen_init_time_ops() use shared_info's pvclock_vcpu_time_info area. This area is not properly available until shared_info is mapped in xen_setup_shared_info(). This became especially problematic due to commit `dd759d93f4` ("x86/timers: Add simple udelay calibration") where we end up reading tsc_to_system_mul from xen_dummy_shared_info (i.e. getting zero value) and then trying to divide by it in pvclock_tsc_khz(). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2017-05-05 10:43:15 +02:00
Boris Ostrovsky	84d582d236	xen: Revert commits `da72ff5bfc` and `72a9b18629` Recent discussion (http://marc.info/?l=xen-devel&m=149192184523741) established that commit `72a9b18629` ("xen: Remove event channel notification through Xen PCI platform device") (and thus commit `da72ff5bfc` ("partially revert "xen: Remove event channel notification through Xen PCI platform device"")) are unnecessary and, in fact, prevent HVM guests from booting on Xen releases prior to 4.0 Therefore we revert both of those commits. The summary of that discussion is below: Here is the brief summary of the current situation: Before the offending commit (`72a9b18629`): 1) INTx does not work because of the reset_watches path. 2) The reset_watches path is only taken if you have Xen > 4.0 3) The Linux Kernel by default will use vector inject if the hypervisor support. So even INTx does not work no body running the kernel with Xen > 4.0 would notice. Unless he explicitly disabled this feature either in the kernel or in Xen (and this can only be disabled by modifying the code, not user-supported way to do it). After the offending commit (+ partial revert): 1) INTx is no longer support for HVM (only for PV guests). 2) Any HVM guest The kernel will not boot on Xen < 4.0 which does not have vector injection support. Since the only other mode supported is INTx which. So based on this summary, I think before commit (`72a9b18629`) we were in much better position from a user point of view. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>	2017-05-02 11:18:05 +02:00
Nicolai Stange	3d18d661aa	x86/xen/time: Set ->min_delta_ticks and ->max_delta_ticks In preparation for making the clockevents core NTP correction aware, all clockevent device drivers must set ->min_delta_ticks and ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a clockevent device's rate is going to change dynamically and thus, the ratio of ns to ticks ceases to stay invariant. Make the x86 arch's xen clockevent driver initialize these fields properly. This patch alone doesn't introduce any change in functionality as the clockevents core still looks exclusively at the (untouched) ->min_delta_ns and ->max_delta_ns. As soon as this has changed, a followup patch will purge the initialization of ->min_delta_ns and ->max_delta_ns from this driver. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: xen-devel@lists.xenproject.org Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2017-04-14 13:11:03 -07:00
Thomas Gleixner	a5a1d1c291	clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>	2016-12-25 11:04:12 +01:00
KarimAllah Ahmed	72a9b18629	xen: Remove event channel notification through Xen PCI platform device Ever since commit `254d1a3f02` ("xen/pv-on-hvm kexec: shutdown watches from old kernel") using the INTx interrupt from Xen PCI platform device for event channel notification would just lockup the guest during bootup. postcore_initcall now calls xs_reset_watches which will eventually try to read a value from XenStore and will get stuck on read_reply at XenBus forever since the platform driver is not probed yet and its INTx interrupt handler is not registered yet. That means that the guest can not be notified at this moment of any pending event channels and none of the per-event handlers will ever be invoked (including the XenStore one) and the reply will never be picked up by the kernel. The exact stack where things get stuck during xenbus_init: -xenbus_init -xs_init -xs_reset_watches -xenbus_scanf -xenbus_read -xs_single -xs_single -xs_talkv Vector callbacks have always been the favourite event notification mechanism since their introduction in commit `38e20b07ef` ("x86/xen: event channels delivery on HVM.") and the vector callback feature has always been advertised for quite some time by Xen that's why INTx was broken for several years now without impacting anyone. Luckily this also means that event channel notification through INTx is basically dead-code which can be safely removed without impacting anybody since it has been effectively disabled for more than 4 years with nobody complaining about it (at least as far as I'm aware of). This commit removes event channel notification through Xen PCI platform device. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Julien Grall <julien.grall@citrix.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: xen-devel@lists.xenproject.org Cc: linux-kernel@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2016-09-30 11:44:34 +01:00
Juergen Gross	d34c30cc1f	xen: add static initialization of steal_clock op to xen_time_ops pv_time_ops might be overwritten with xen_time_ops after the steal_clock operation has been initialized already. To prevent calling a now uninitialized function pointer add the steal_clock static initialization to xen_time_ops. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2016-07-26 14:07:06 +01:00
Vitaly Kuznetsov	ad5475f9fa	x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op HYPERVISOR_vcpu_op() passes Linux's idea of vCPU id as a parameter while Xen's idea is expected. In some cases these ideas diverge so we need to do remapping. Convert all callers of HYPERVISOR_vcpu_op() to use xen_vcpu_nr(). Leave xen_fill_possible_map() and xen_filter_cpu_maps() intact as they're only being called by PV guests before perpu areas are initialized. While the issue could be solved by switching to early_percpu for xen_vcpu_id I think it's not worth it: PV guests will probably never get to the point where their idea of vCPU id diverges from Xen's. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2016-07-25 13:32:34 +01:00
Juergen Gross	ecb23dc6f2	xen: add steal_clock support on x86 The pv_time_ops structure contains a function pointer for the "steal_clock" functionality used only by KVM and Xen on ARM. Xen on x86 uses its own mechanism to account for the "stolen" time a thread wasn't able to run due to hypervisor scheduling. Add support in Xen arch independent time handling for this feature by moving it out of the arm arch into drivers/xen and remove the x86 Xen hack. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2016-07-06 10:34:48 +01:00
Stefano Stabellini	c06b6d70fe	xen/x86: don't lose event interrupts On slow platforms with unreliable TSC, such as QEMU emulated machines, it is possible for the kernel to request the next event in the past. In that case, in the current implementation of xen_vcpuop_clockevent, we simply return -ETIME. To be precise the Xen returns -ETIME and we pass it on. However the result of this is a missed event, which simply causes the kernel to hang. Instead it is better to always ask the hypervisor for a timer event, even if the timeout is in the past. That way there are no lost interrupts and the kernel survives. To do that, remove the VCPU_SSHOTTMR_future flag. Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Juergen Gross <jgross@suse.com>	2016-05-24 12:58:17 +01:00
Stefano Stabellini	187b26a972	xen/x86: convert remaining timespec to timespec64 in xen_pvclock_gtod_notify Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2015-12-21 14:40:59 +00:00
Stefano Stabellini	7609686313	xen/x86: support XENPF_settime64 Try XENPF_settime64 first, if it is not available fall back to XENPF_settime32. No need to call __current_kernel_time() when all the info needed are already passed via the struct timekeeper * argument. Return NOTIFY_BAD in case of errors. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2015-12-21 14:40:59 +00:00
Stefano Stabellini	f3d6027ee0	xen: introduce XENPF_settime64 Rename the current XENPF_settime hypercall and related struct to XENPF_settime32. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2015-12-21 14:40:57 +00:00
Stefano Stabellini	cfafae9403	xen: rename dom0_op to platform_op The dom0_op hypercall has been renamed to platform_op since Xen 3.2, which is ancient, and modern upstream Linux kernels cannot run as dom0 and it anymore anyway. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2015-12-21 14:40:55 +00:00
Stefano Stabellini	4ccefbe597	xen: move xen_setup_runstate_info and get_runstate_snapshot to drivers/xen/time.c Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2015-12-21 14:40:52 +00:00
Viresh Kumar	955381dd65	x86/xen/time: Migrate to new set-state interface Migrate xen driver to the new 'set-state' interface provided by clockevents core, the earlier 'set-mode' interface is marked obsolete now. This also enables us to implement callbacks for new states of clockevent devices, for example: ONESHOT_STOPPED. Callbacks aren't implemented for modes where we weren't doing anything. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xenproject.org (moderated list:XEN HYPERVISOR INTERFACE) Link: http://lkml.kernel.org/r/881eea6e1a3d483cd33e044cd34827cce26a57fd.1437042675.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2015-07-30 21:25:38 +02:00
Palik, Imre	94dd85f6a0	x86/xen: prefer TSC over xen clocksource for dom0 In Dom0's the use of the TSC clocksource (whenever it is stable enough to be used) instead of the Xen clocksource should not cause any issues, as Dom0 VMs never live-migrated. The TSC clocksource is somewhat more efficient than the Xen paravirtualised clocksource, thus it should have higher rating. This patch decreases the rating of the Xen clocksource in Dom0s to 275. Which is half-way between the rating of the TSC clocksource (300) and the hpet clocksource (250). Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: Imre Palik <imrep@amazon.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-20 18:44:24 +00:00
Vitaly Kuznetsov	7be0772d19	x86/xen: avoid freeing static 'name' when kasprintf() fails In case kasprintf() fails in xen_setup_timer() we assign name to the static string "<timer kasprintf failed>". We, however, don't check that fact before issuing kfree() in xen_teardown_timer(), kernel is supposed to crash with 'kernel BUG at mm/slub.c:3341!' Solve the issue by making name a fixed length string inside struct xen_clock_event_device. 16 bytes should be enough. Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2015-01-08 13:55:25 +00:00
Boris Ostrovsky	8b8cd8a367	x86/xen: Remove unnecessary BUG_ON(preemptible()) in xen_setup_timer() There is no reason for having it and, with commit `250a1ac685` ("x86, smpboot: Remove pointless preempt_disable() in native_smp_prepare_cpus()"), it prevents HVM guests from booting. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-12-23 10:31:55 +00:00
Boris Ostrovsky	3251f20b89	x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read() Commit `89cbc76768` ("x86: Replace __get_cpu_var uses") replaced __get_cpu_var() with this_cpu_ptr() in xen_clocksource_read() in such a way that instead of accessing a structure pointed to by a per-cpu pointer we are trying to get to a per-cpu structure. __this_cpu_read() of the pointer is the more appropriate accessor. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>	2014-10-23 16:24:02 +01:00
Christoph Lameter	89cbc76768	x86: Replace __get_cpu_var uses __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int x = &__get_cpu_var(y); Converts to int x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-26 13:45:49 -04:00
David Vrabel	8d5999df35	x86/xen: resume timer irqs early If the timer irqs are resumed during device resume it is possible in certain circumstances for the resume to hang early on, before device interrupts are resumed. For an Ubuntu 14.04 PVHVM guest this would occur in ~0.5% of resume attempts. It is not entirely clear what is occuring the point of the hang but I think a task necessary for the resume calls schedule_timeout(), waiting for a timer interrupt (which never arrives). This failure may require specific tasks to be running on the other VCPUs to trigger (processes are not frozen during a suspend/resume if PREEMPT is disabled). Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in syscore_resume(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org	2014-08-11 11:59:34 +01:00
David Vrabel	8785c67663	xen/x86: set VIRQ_TIMER priority to maximum Commit `bee980d9e` (xen/events: Handle VIRQ_TIMER before any other hardirq in event loop) effectively made the VIRQ_TIMER the highest priority event when using the 2-level ABI. Set the VIRQ_TIMER priority to the highest so this behaviour is retained when using the FIFO-based ABI. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2014-01-06 10:07:55 -05:00
Michael Opdenacker	9d71cee667	x86/xen: remove deprecated IRQF_DISABLED This patch proposes to remove the IRQF_DISABLED flag from x86/xen code. It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-11-06 15:31:01 -05:00
Linus Torvalds	21884a83b2	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: "The timer changes contain: - posix timer code consolidation and fixes for odd corner cases - sched_clock implementation moved from ARM to core code to avoid duplication by other architectures - alarm timer updates - clocksource and clockevents unregistration facilities - clocksource/events support for new hardware - precise nanoseconds RTC readout (Xen feature) - generic support for Xen suspend/resume oddities - the usual lot of fixes and cleanups all over the place The parts which touch other areas (ARM/XEN) have been coordinated with the relevant maintainers. Though this results in an handful of trivial to solve merge conflicts, which we preferred over nasty cross tree merge dependencies. The patches which have been committed in the last few days are bug fixes plus the posix timer lot. The latter was in akpms queue and next for quite some time; they just got forgotten and Frederic collected them last minute." * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits) hrtimer: Remove unused variable hrtimers: Move SMP function call to thread context clocksource: Reselect clocksource when watchdog validated high-res capability posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting posix_timers: fix racy timer delta caching on task exit posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule() selftests: add basic posix timers selftests posix_cpu_timers: consolidate expired timers check posix_cpu_timers: consolidate timer list cleanups posix_cpu_timer: consolidate expiry time type tick: Sanitize broadcast control logic tick: Prevent uncontrolled switch to oneshot mode tick: Make oneshot broadcast robust vs. CPU offlining x86: xen: Sync the CMOS RTC as well as the Xen wallclock x86: xen: Sync the wallclock when the system time is set timekeeping: Indicate that clock was set in the pvclock gtod notifier timekeeping: Pass flags instead of multiple bools to timekeeping_update() xen: Remove clock_was_set() call in the resume path hrtimers: Support resuming with two or more CPUs online (but stopped) timer: Fix jiffies wrap behavior of round_jiffies_common() ...	2013-07-06 14:09:38 -07:00
David Vrabel	47433b8c9d	x86: xen: Sync the CMOS RTC as well as the Xen wallclock Adjustments to Xen's persistent clock via update_persistent_clock() don't actually persist, as the Xen wallclock is a software only clock and modifications to it do not modify the underlying CMOS RTC. The x86_platform.set_wallclock hook is there to keep the hardware RTC synchronized. On a guest this is pointless. On Dom0 we can use the native implementaion which actually updates the hardware RTC, but we still need to keep the software emulation of RTC for the guests up to date. The subscription to the pvclock_notifier allows us to emulate this easily. The notifier is called at every tick and when the clock was set. Right now we only use that notifier when the clock was set, but due to the fact that it is called periodically from the timekeeping update code, we can utilize it to emulate the NTP driven drift compensation of update_persistant_clock() for the Xen wall (software) clock. Add a 11 minutes periodic update to the pvclock_gtod notifier callback to achieve that. The static variable 'next' which maintains that 11 minutes update cycle is protected by the core code serialization so there is no need to add a Xen specific serialization mechanism. [ tglx: Massaged changelog and added a few comments ] Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: John Stultz <john.stultz@linaro.org> Cc: <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/1372329348-20841-6-git-send-email-david.vrabel@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-06-28 23:15:07 +02:00
David Vrabel	5584880e44	x86: xen: Sync the wallclock when the system time is set Currently the Xen wallclock is only updated every 11 minutes if NTP is synchronized to its clock source (using the sync_cmos_clock() work). If a guest is started before NTP is synchronized it may see an incorrect wallclock time. Use the pvclock_gtod notifier chain to receive a notification when the system time has changed and update the wallclock to match. This chain is called on every timer tick and we want to avoid an extra (expensive) hypercall on every tick. Because dom0 has historically never provided a very accurate wallclock and guests do not expect one, we can do this simply: the wallclock is only updated if the clock was set. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: John Stultz <john.stultz@linaro.org> Cc: <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/1372329348-20841-5-git-send-email-david.vrabel@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-06-28 23:15:06 +02:00
Laszlo Ersek	0b0c002c34	xen/time: remove blocked time accounting from xen "clockchip" ... because the "clock_event_device framework" already accounts for idle time through the "event_handler" function pointer in xen_timer_interrupt(). The patch is intended as the completion of [1]. It should fix the double idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to stolen time accounting (the removed code seems to be isolated). The approach may be completely misguided. [1] https://lkml.org/lkml/2011/10/6/10 [2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html John took the time to retest this patch on top of v3.10 and reported: "idle time is correctly incremented for pv and hvm for the normal case, nohz=off and nohz=idle." so lets put this patch in. CC: stable@vger.kernel.org Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: John Haxby <john.haxby@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-06-28 12:11:39 -04:00
Konrad Rzeszutek Wilk	09e99da766	xen/time: Free onlined per-cpu data structure if we want to online it again. If the per-cpu time data structure has been onlined already and we are trying to online it again, then free the previous copy before blindly over-writting it. A developer naturally should not call this function multiple times but just in case. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-06-10 08:43:37 -04:00
Konrad Rzeszutek Wilk	a05e2c371f	xen/time: Check that the per_cpu data structure has data before freeing. We don't check whether the per_cpu data structure has actually been freed in the past. This checks it and if it has been freed in the past then just continues on without double-freeing. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-06-10 08:43:36 -04:00
Konrad Rzeszutek Wilk	c9d76a24a2	xen/time: Don't leak interrupt name when offlining. When the user does: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online kmemleak reports: kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak) One of the leaks is from xen/time: unreferenced object 0xffff88003fa51280 (size 32): comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s) hex dump (first 32 bytes): 74 69 6d 65 72 31 00 00 00 00 00 00 00 00 00 00 timer1.......... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81660721>] kmemleak_alloc+0x21/0x50 [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0 [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90 [<ffffffff812fe228>] kasprintf+0x38/0x40 [<ffffffff81041ec1>] xen_setup_timer+0x51/0xf0 [<ffffffff8166339f>] xen_cpu_up+0x5f/0x3e8 [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b [<ffffffff8166bd48>] cpu_up+0xd9/0xec [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3 [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6 [<ffffffff8165ce39>] kernel_init+0x9/0xf0 [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff This patch fixes it by stashing away the 'name' in the per-cpu data structure and freeing it when offlining the CPU. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-06-10 08:43:35 -04:00
Konrad Rzeszutek Wilk	31620a198c	xen/time: Encapsulate the struct clock_event_device in another structure. We don't do any code movement. We just encapsulate the struct clock_event_device in a new structure which contains said structure and a pointer to a char *name. The 'name' will be used in 'xen/time: Don't leak interrupt name when offlining'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-06-10 08:43:34 -04:00
David Vrabel	3565184ed0	x86: Increase precision of x86_platform.get/set_wallclock() All the virtualized platforms (KVM, lguest and Xen) have persistent wallclocks that have more than one second of precision. read_persistent_wallclock() and update_persistent_wallclock() allow for nanosecond precision but their implementation on x86 with x86_platform.get/set_wallclock() only allows for one second precision. This means guests may see a wallclock time that is off by up to 1 second. Make set_wallclock() and get_wallclock() take a struct timespec parameter (which allows for nanosecond precision) so KVM and Xen guests may start with a more accurate wallclock time and a Xen dom0 can maintain a more accurate wallclock for guests. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-05-28 14:00:59 -07:00
Konrad Rzeszutek Wilk	ef35a4e6d9	xen/time: Add default value of -1 for IRQ and check for that. If the timer interrupt has been de-init or is just now being initialized, the default value of -1 should be preset as interrupt line. Check for that and if something is odd WARN us. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-04-16 16:05:14 -04:00
Konrad Rzeszutek Wilk	7918c92ae9	xen/time: Fix kasprintf splat when allocating timer%d IRQ line. When we online the CPU, we get this splat: smpboot: Booting Node 0 Processor 1 APIC 0x2 installing Xen timer for CPU 1 BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1 Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1 Call Trace: [<ffffffff810c1fea>] __might_sleep+0xda/0x100 [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0 [<ffffffff81303758>] ? kasprintf+0x38/0x40 [<ffffffff813036eb>] kvasprintf+0x5b/0x90 [<ffffffff81303758>] kasprintf+0x38/0x40 [<ffffffff81044510>] xen_setup_timer+0x30/0xb0 [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30 [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8 The solution to that is use kasprintf in the CPU hotplug path that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify, and remove the call to in xen_hvm_setup_cpu_clockevents. Unfortunatly the later is not a good idea as the bootup path does not use xen_hvm_cpu_notify so we would end up never allocating timer%d interrupt lines when booting. As such add the check for atomic() to continue. CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2013-04-16 16:05:07 -04:00
Linus Torvalds	403299a851	Merge branch 'upstream/xen-settime' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/xen-settime' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen/dom0: set wallclock time in Xen xen: add dom0_op hypercall xen/acpi: Domain0 acpi parser related platform hypercall	2011-11-06 20:15:05 -08:00
Jeremy Fitzhardinge	fdb9eb9f15	xen/dom0: set wallclock time in Xen Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-09-26 11:04:39 -07:00
Jeremy Fitzhardinge	f1c39625d6	xen: use non-tracing preempt in xen_clocksource_read() The tracing code used sched_clock() to get tracing timestamps, which ends up calling xen_clocksource_read(). xen_clocksource_read() must disable preemption, but if preemption tracing is enabled, this results in infinite recursion. I've only noticed this when boot-time tracing tests are enabled, but it seems like a generic bug. It looks like it would also affect kvm_clocksource_read(). Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com>	2011-08-24 09:54:24 -07:00
Linus Torvalds	0f1bdc1815	Merge branch 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: convert mips to generic i8253 clocksource clocksource: convert x86 to generic i8253 clocksource clocksource: convert footbridge to generic i8253 clocksource clocksource: add common i8253 PIT clocksource blackfin: convert to clocksource_register_hz mips: convert to clocksource_register_hz/khz sparc: convert to clocksource_register_hz/khz alpha: convert to clocksource_register_hz microblaze: convert to clocksource_register_hz/khz ia64: convert to clocksource_register_hz/khz x86: Convert remaining x86 clocksources to clocksource_register_hz/khz Make clocksource name const	2011-05-19 17:44:13 -07:00
Daniel Kiper	fb6ce5dea4	arch/x86/xen/time: Cleanup code/data sections definitions Cleanup code/data sections definitions accordingly to include/linux/init.h. Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-05-19 11:30:39 -04:00
Thomas Gleixner	a18f22a968	Merge branch 'consolidate-clksrc-i8253' of master.kernel.org:~rmk/linux-2.6-arm into timers/clocksource Conflicts: arch/ia64/kernel/cyclone.c arch/mips/kernel/i8253.c arch/x86/kernel/i8253.c Reason: Resolve conflicts so further cleanups do not conflict further Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-05-14 12:06:36 +02:00
Ian Campbell	f611f2da99	xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. The patches missed an indirect use of IRQF_NO_SUSPEND pulled in via IRQF_TIMER. The following patch fixes the issue. With this fixlet PV guest migration works just fine. I also booted the entire series as a dom0 kernel and it appeared fine. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-03-03 12:00:31 -05:00
John Stultz	b01cc1b0ea	x86: Convert remaining x86 clocksources to clocksource_register_hz/khz This converts the remaining x86 clocksources to use clocksource_register_hz/khz. CC: jacob.jun.pan@intel.com CC: Glauber Costa <glommer@redhat.com> CC: Dimitri Sivanich <sivanich@sgi.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Jeremy Fitzhardinge <jeremy@xensource.com> CC: Chris McDermott <lcm@us.ibm.com> CC: Thomas Gleixner <tglx@linutronix.de> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen] Signed-off-by: John Stultz <johnstul@us.ibm.com>	2011-02-21 13:33:33 -08:00
Tejun Heo	275c8b9328	Merge branch 'this_cpu_ops' into for-2.6.38	2010-12-17 15:16:46 +01:00
Christoph Lameter	780f36d8b3	xen: Use this_cpu_ops Use this_cpu_ops to reduce code size and simplify things in various places. V3->V4: Move instance of this_cpu_inc_return to a later patchset so that this patch can be applied without infrastructure changes. Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-17 15:07:19 +01:00
Jeremy Fitzhardinge	e7a3481c02	x86/pvclock: Zero last_value on resume If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-28 09:33:20 +01:00
Stefano Stabellini	31e7e931cd	xen: do not initialize PV timers on HVM if !xen_have_vector_callback if !xen_have_vector_callback do not initialize PV timer unconditionally because we still don't know how many cpus are available and if there is more than one we won't be able to receive the timer interrupts on cpu > 0. This patch fixes an hang at boot when Xen does not support vector callbacks and the guest has multiple vcpus. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>	2010-10-05 13:39:23 +01:00
Jeremy Fitzhardinge	ca50a5f390	Merge branch 'upstream/pvhvm' into upstream/xen * upstream/pvhvm: Introduce CONFIG_XEN_PVHVM compile option blkfront: do not create a PV cdrom device if xen_hvm_guest support multiple .discard.* sections to avoid section type conflicts xen/pvhvm: fix build problem when !CONFIG_XEN xenfs: enable for HVM domains too x86: Call HVMOP_pagetable_dying on exit_mmap. x86: Unplug emulated disks and nics. x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock. xen: Fix find_unbound_irq in presence of ioapic irqs. xen: Add suspend/resume support for PV on HVM guests. xen: Xen PCI platform device driver. x86/xen: event channels delivery on HVM. x86: early PV on HVM features initialization. xen: Add support for HVM hypercalls. Conflicts: arch/x86/xen/enlighten.c arch/x86/xen/time.c	2010-08-04 14:49:16 -07:00
Jeremy Fitzhardinge	a70ce4b606	Merge branch 'upstream/core' into upstream/xen * upstream/core: xen/panic: use xen_reboot and fix smp_send_stop Xen: register panic notifier to take crashes of xen guests on panic xen: support large numbers of CPUs with vcpu info placement xen: drop xen_sched_clock in favour of using plain wallclock time pvops: do not notify callers from register_xenstore_notifier xen: make sure pages are really part of domain before freeing xen: release unused free memory	2010-08-04 14:49:05 -07:00
Jeremy Fitzhardinge	8a22b9996b	xen: drop xen_sched_clock in favour of using plain wallclock time xen_sched_clock only counts unstolen time. In principle this should be useful to the Linux scheduler so that it knows how much time a process actually consumed. But in practice this doesn't work very well as the scheduler expects the sched_clock time to be synchronized between cpus. It also uses sched_clock to measure the time a task spends sleeping, in which case "unstolen time" isn't meaningful. So just use plain xen_clocksource_read to return wallclock nanoseconds for sched_clock. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-08-04 14:47:29 -07:00

1 2

78 Commits