Commit Graph

298966 Commits

Author SHA1 Message Date
Scott Wood
043cc4d724 KVM: PPC: factor out lpid allocator from book3s_64_mmu_hv
We'll use it on e500mc as well.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:51:02 +03:00
Scott Wood
06aae86799 powerpc/e500: split CPU_FTRS_ALWAYS/CPU_FTRS_POSSIBLE
Split e500 (v1/v2) and e500mc/e5500 to allow optimization of feature
checks that differ between the two.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:50:54 +03:00
Scott Wood
52b066fa4e powerpc/booke: Set CPU_FTR_DEBUG_LVL_EXC on 32-bit
Currently 32-bit only cares about this for choice of exception
vector, which is done in core-specific code.  However, KVM will
want to distinguish as well.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:50:31 +03:00
Takuya Yoshikawa
93474b25af KVM: Remove unused dirty_bitmap_head and nr_dirty_pages
Now that we do neither double buffering nor heuristic selection of the
write protection method these are not needed anymore.

Note: some drivers have their own implementation of set_bit_le() and
making it generic needs a bit of work; so we use test_and_set_bit_le()
and will later replace it with generic set_bit_le().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:50:01 +03:00
Takuya Yoshikawa
60c34612b7 KVM: Switch to srcu-less get_dirty_log()
We have seen some problems of the current implementation of
get_dirty_log() which uses synchronize_srcu_expedited() for updating
dirty bitmaps; e.g. it is noticeable that this sometimes gives us ms
order of latency when we use VGA displays.

Furthermore the recent discussion on the following thread
    "srcu: Implement call_srcu()"
    http://lkml.org/lkml/2012/1/31/211
also motivated us to implement get_dirty_log() without SRCU.

This patch achieves this goal without sacrificing the performance of
both VGA and live migration: in practice the new code is much faster
than the old one unless we have too many dirty pages.

Implementation:

The key part of the implementation is the use of xchg() operation for
clearing dirty bits atomically.  Since this allows us to update only
BITS_PER_LONG pages at once, we need to iterate over the dirty bitmap
until every dirty bit is cleared again for the next call.

Although some people may worry about the problem of using the atomic
memory instruction many times to the concurrently accessible bitmap,
it is usually accessed with mmu_lock held and we rarely see concurrent
accesses: so what we need to care about is the pure xchg() overheads.

Another point to note is that we do not use for_each_set_bit() to check
which ones in each BITS_PER_LONG pages are actually dirty.  Instead we
simply use __ffs() in a loop.  This is much faster than repeatedly call
find_next_bit().

Performance:

The dirty-log-perf unit test showed nice improvements, some times faster
than before, except for some extreme cases; for such cases the speed of
getting dirty page information is much faster than we process it in the
userspace.

For real workloads, both VGA and live migration, we have observed pure
improvements: when the guest was reading a file during live migration,
we originally saw a few ms of latency, but with the new method the
latency was less than 200us.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:50:00 +03:00
Takuya Yoshikawa
5dc99b2380 KVM: Avoid checking huge page mappings in get_dirty_log()
Dropped such mappings when we enabled dirty logging and we will never
create new ones until we stop the logging.

For this we introduce a new function which can be used to write protect
a range of PT level pages: although we do not need to care about a range
of pages at this point, the following patch will need this feature to
optimize the write protection of many pages.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:49:58 +03:00
Takuya Yoshikawa
a0ed46073c KVM: MMU: Split the main body of rmap_write_protect() off from others
We will use this in the following patch to implement another function
which needs to write protect pages using the rmap information.

Note that there is a small change in debug printing for large pages:
we do not differentiate them from others to avoid duplicating code.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:49:56 +03:00
Eric B Munson
248997095d kvmclock: remove unneeded EXPORT macro
check_and_clear_guest_paused does not need to be exported as it isn't used
by any modules, remove the export.

Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:49:54 +03:00
Marcelo Tosatti
8c84780df9 KVM: fix kvm_vcpu_kick build failure on S390
S390's kvm_vcpu_stat does not contain halt_wakeup member.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:49:42 +03:00
Eric B Munson
5d1c0f4a80 watchdog: add check for suspended vm in softlockup detector
A suspended VM can cause spurious soft lockup warnings.  To avoid these, the
watchdog now checks if the kernel knows it was stopped by the host and skips
the warning if so.  When the watchdog is reset successfully, clear the guest
paused flag.

Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:49:03 +03:00
Eric B Munson
1c0b28c2a4 KVM: x86: Add ioctl for KVM_KVMCLOCK_CTRL
Now that we have a flag that will tell the guest it was suspended, create an
interface for that communication using a KVM ioctl.

Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:49:01 +03:00
Eric B Munson
3b5d56b931 kvmclock: Add functions to check if the host has stopped the vm
When a host stops or suspends a VM it will set a flag to show this.  The
watchdog will use these functions to determine if a softlockup is real, or the
result of a suspended VM.

Signed-off-by: Eric B Munson <emunson@mgebm.net>
asm-generic changes Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:48:59 +03:00
Eric B Munson
eae3ee7d8a x86: pvclock: Add flag to indicate that a vm was stopped by the host
This flag will be used to check if the vm was stopped by the host when a soft
lockup was detected.  The host will set the flag when it stops the guest.  On
resume, the guest will check this flag if a soft lockup is detected and skip
issuing the warning.

Signed-off-by: Eric B Munson <emunson@mgebm.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:48:57 +03:00
Alexander Graf
2246f8b563 KVM: PPC: Rework wqp conditional code
On PowerPC, we sometimes use a waitqueue per core, not per thread,
so we can't always use the vcpu internal waitqueue.

This code has been generalized by Christoffer Dall recently, but
unfortunately broke compilation for PowerPC. At the time the helper
function is defined, struct kvm_vcpu is not declared yet, so we can't
dereference it.

This patch moves all logic into the generic inline function, at which
time we have all information necessary.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:47:49 +03:00
Christoffer Dall
b6d33834bd KVM: Factor out kvm_vcpu_kick to arch-generic code
The kvm_vcpu_kick function performs roughly the same funcitonality on
most all architectures, so we shouldn't have separate copies.

PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch
structure and to accomodate this special need a
__KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function
kvm_arch_vcpu_wq have been defined. For all other architectures this
is a generic inline that just returns &vcpu->wq;

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:47:47 +03:00
Avi Kivity
66ef89315f KVM: schedule debugfs statistics for removal
Deprecated in favour of tracepoints.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:47:32 +03:00
Jason Wang
675acb758a KVM: SVM: count all irq windows exit
Also count the exits of fast-path.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:47:01 +03:00
Amos Kong
786a9f888b KVM: set upper bounds for iobus dev to limit userspace
kvm_io_bus devices are used for ioevent, pit, pic, ioapic,
coalesced_mmio.

Currently Qemu only emulates one PCI bus, it contains 32 slots,
one slot contains 8 functions, maximum of supported PCI devices:
 1 * 32 * 8 = 256. One virtio-blk takes one iobus device,
one virtio-net(vhost=on) takes two iobus devices.
The maximum of coalesced mmio zone is 100, each zone
has an iobus devices. So 300 io_bus devices are not enough.

Set an upper bounds for kvm_io_range to limit userspace.
1000 is a very large limit and not bloat the typical user.

Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:47:00 +03:00
Amos Kong
a13007160f KVM: resize kvm_io_range array dynamically
This patch makes the kvm_io_range array can be resized dynamically.

Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:46:58 +03:00
Liu, Jinsong
83c529151a KVM: x86: expose Intel cpu new features (HLE, RTM) to guest
Intel recently release 2 new features, HLE and RTM.
Refer to http://software.intel.com/file/41417.
This patch expose them to guest.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08 12:46:32 +03:00
Linus Torvalds
0034102808 Linux 3.4-rc2 2012-04-07 18:30:41 -07:00
Linus Torvalds
f4e52e7ffd regmap: A couple of small fixes for 3.4
Two more small fixes:
 - Now we have users for it that aren't running Android it turns out that
   regcache_sync_region() is much more useful to drivers if it's exported
   for use by modules.  Who knew?
 - Make sure we don't divide by zero when doing debugfs dumps of rbtrees,
   not visible up until now because everything was providing at least
   some cache on startup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPfMvJAAoJEBus8iNuMP3d43YQAI8IJqPoAqK2eKjQlYNRzP3O
 hWgA6oU56Yqg0PZKKTbWKkul2j9onRV7UrCsXrKo9gCVFNAROkMh9q8uZxzf7yl1
 AlOsoKDH/ijYhuAkbLri5tWc8vw5SZS/rSXx6BnVAIPgDjaCEoJcd6swJTfieuyz
 slN+y3Y3FDk7zIefkcAlMpUR5ks+jAHOHhk/Kwe5+xP3xk/09acuiNogpPYRH4Fp
 2tV9Qr9cSrDKIX8eLkR/AkRkmESMIzkpEopQY4vpYO+GiEwyKGdGjMTqkgjQ7PSk
 jL1lp36CAeVuR7Bp3OFT7bilXZKTrkOiwkC2ctFmyjYK+VO4HWBeOeMmoZvTBRCO
 +RXAZVN0zFyxPuH6ZJqOuQpCyoY0JBZPZulwRrXGsQpQOoITuEt9yJpLfDSj6hYd
 Pj8NLHT10n8DBnLk8nXuxT0mNgGDBTNOVCpVblmfm2CLcEGOQsAzWCgCKjkehCUJ
 O3I/3ZHzs1tvCZNcmt5HH8d8D+iMtkOS8bSHTHvZ2ADjSXWGPgXYlUwObYH6kV9N
 nMYi8Q6r8skkESL1jaE12XMZxGm07emIyUh+9hfM0lLGEC/cPff2gXwKhtZMDQfE
 XELx3e/EbyqNsNqFd71v9XpGyJA9si7JvPY/ZSei/CTqToIEAsX/BwGMKGAWnrNy
 ARlp9oaM6BOOg+i2Ddrg
 =qm1Q
 -----END PGP SIGNATURE-----

Merge tag 'regmap-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull two more small regmap fixes from Mark Brown:
 - Now we have users for it that aren't running Android it turns out
   that regcache_sync_region() is much more useful to drivers if it's
   exported for use by modules.  Who knew?
 - Make sure we don't divide by zero when doing debugfs dumps of
   rbtrees, not visible up until now because everything was providing at
   least some cache on startup.

* tag 'regmap-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: prevent division by zero in rbtree_show
  regmap: Export regcache_sync_region()
2012-04-07 09:56:00 -07:00
Linus Torvalds
a3fac08085 Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull a few KVM fixes from Avi Kivity:
 "A bunch of powerpc KVM fixes, a guest and a host RCU fix (unrelated),
  and a small build fix."

* 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Resolve RCU vs. async page fault problem
  KVM: VMX: vmx_set_cr0 expects kvm->srcu locked
  KVM: PMU: Fix integer constant is too large warning in kvm_pmu_set_msr()
  KVM: PPC: Book3S: PR: Fix preemption
  KVM: PPC: Save/Restore CR over vcpu_run
  KVM: PPC: Book3S HV: Save and restore CR in __kvmppc_vcore_entry
  KVM: PPC: Book3S HV: Fix kvm_alloc_linear in case where no linears exist
  KVM: PPC: Book3S: Compile fix for ppc32 in HIOR access code
2012-04-07 09:53:33 -07:00
Linus Torvalds
664481ed45 SuperH updates for 3.4-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iEYEABECAAYFAk99uBgACgkQGkmNcg7/o7hglwCgqi6CE7i5gyneNYBn2ocRps4O
 y1UAoMSIscO6YWcHPuxOiNBbJYUy/jMI
 =SEO8
 -----END PGP SIGNATURE-----

Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

Pull SuperH fixes from Paul Mundt.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
  sh: fix clock-sh7757 for the latest sh_mobile_sdhi driver
  serial: sh-sci: use serial_port_in/out vs sci_in/out.
  sh: vsyscall: Fix up .eh_frame generation.
  sh: dma: Fix up device attribute mismatch from sysdev fallout.
  sh: dwarf unwinder depends on SHcompact.
  sh: fix up fallout from system.h disintegration.
2012-04-07 09:52:46 -07:00
Linus Torvalds
d6a624eef1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer fixlet from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  sysctl: fix write access to dmesg_restrict/kptr_restrict
2012-04-07 09:51:36 -07:00
Linus Torvalds
f21fec96ea Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull ACPI & Power Management patches from Len Brown:
 "Two fixes for cpuidle merge-window changes, plus a URL fix in
  MAINTAINERS"

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  MAINTAINERS: Update git url for ACPI
  cpuidle: Fix panic in CPU off-lining with no idle driver
  ACPI processor: Use safe_halt() rather than halt() in acpi_idle_play_dead()
2012-04-06 19:56:04 -07:00
Linus Torvalds
a0421da44f Merge branch '3.4-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target fixes from Nicholas Bellinger:
 "Pull two tcm_fc fabric related fixes for -rc2:

  Note that both have been CC'ed to stable, and patch #1 is the
  important one that addresses a memory corruption bug related to FC
  exchange timeouts + command abort.

  Thanks again to MDR for tracking down this issue!"

* '3.4-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  tcm_fc: Do not free tpg structure during wq allocation failure
  tcm_fc: Add abort flag for gracefully handling exchange timeout
2012-04-06 19:54:26 -07:00
Mark Rustad
06383f10c4 tcm_fc: Do not free tpg structure during wq allocation failure
Avoid freeing a registered tpg structure if an alloc_workqueue call
fails.  This fixes a bug where the failure was leaking memory associated
with se_portal_group setup during the original core_tpg_register() call.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Kiran Patil <Kiran.patil@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-04-06 18:57:05 -07:00
Mark Rustad
e1c4038282 tcm_fc: Add abort flag for gracefully handling exchange timeout
Add abort flag and use it to terminate processing when an exchange
is timed out or is reset. The abort flag is used in place of the
transport_generic_free_cmd function call in the reset and timeout
cases, because calling that function in that context would free
memory that was in use. The aborted flag allows the lifetime to
be managed in a more normal way, while truncating the processing.

This change eliminates a source of memory corruption which
manifested in a variety of ugly ways.

(nab: Drop unused struct fc_exch *ep in ft_recv_seq)

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Kiran Patil <Kiran.patil@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-04-06 18:56:43 -07:00
Len Brown
eeaab2d8af Merge branches 'idle-fix' and 'misc' into release 2012-04-06 21:48:59 -04:00
Igor Murzov
aaef292acf MAINTAINERS: Update git url for ACPI
Signed-off-by: Igor Murzov <e-mail@date.by>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-04-06 21:41:43 -04:00
Linus Torvalds
4157368edb Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull arch/tile bug fixes from Chris Metcalf:
 "This includes Paul Gortmaker's change to fix the <asm/system.h>
  disintegration issues on tile, a fix to unbreak the tilepro ethernet
  driver, and a backlog of bugfix-only changes from internal Tilera
  development over the last few months.

  They have all been to LKML and on linux-next for the last few days.
  The EDAC change to MAINTAINERS is an oddity but discussion on the
  linux-edac list suggested I ask you to pull that change through my
  tree since they don't have a tree to pull edac changes from at the
  moment."

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (39 commits)
  drivers/net/ethernet/tile: fix netdev_alloc_skb() bombing
  MAINTAINERS: update EDAC information
  tilepro ethernet driver: fix a few minor issues
  tile-srom.c driver: minor code cleanup
  edac: say "TILEGx" not "TILEPro" for the tilegx edac driver
  arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally
  arch/tile: remove bogus performance optimization
  arch/tile: return SIGBUS for addresses that are unaligned AND invalid
  arch/tile: fix finv_buffer_remote() for tilegx
  arch/tile: use atomic exchange in arch_write_unlock()
  arch/tile: stop mentioning the "kvm" subdirectory
  arch/tile: export the page_home() function.
  arch/tile: fix pointer cast in cacheflush.c
  arch/tile: fix single-stepping over swint1 instructions on tilegx
  arch/tile: implement panic_smp_self_stop()
  arch/tile: add "nop" after "nap" to help GX idle power draw
  arch/tile: use proper memparse() for "maxmem" options
  arch/tile: fix up locking in pgtable.c slightly
  arch/tile: don't leak kernel memory when we unload modules
  arch/tile: fix bug in delay_backoff()
  ...
2012-04-06 17:56:20 -07:00
Linus Torvalds
9479f0f801 Two fixes for regressions:
* one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree,
  * the other is to fix drivers to load on PV (a previous patch made them only
    load in PVonHVM mode).
 
 The rest are just minor fixes in the various drivers and some cleanup in the
 core code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPfyVUAAoJEFjIrFwIi8fJUjUH/jbY5JavRqSlNELZW2A4Ta76
 8p00LqLHw/C56iHZcWKke8mqtWNb+ZfcQt7ZYcxDIYa4QWBL28x0OLAO2tOBIt37
 ZjYESWSdFJaJvmpADluWtFyGyZ9TYJllDTBm/jWj1ZtKSZvR1YkhuMXCS0f4AmGQ
 xFzSWJZUDdiOAqpN+VQD8wP00gfR8knQLg16XE2fvFdQo4XwpCtqLfHV/5pMMGdy
 Cs/ep6rq/7cdv/nshKOcBnw7RW8l3Xoi/28ht8k3DvAQ2VtFq1Tugv2G9pcCHwQG
 DIBkB3SOU6/v6P5at5+egKS5xR1fJetCWlkMd8kkbcdz2NPI4UDMkvOW6Q8yQls=
 =6Ve+
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull xen fixes from Konrad Rzeszutek Wilk:
 "Two fixes for regressions:
   * one is a workaround that will be removed in v3.5 with proper fix in
     the tip/x86 tree,
   * the other is to fix drivers to load on PV (a previous patch made
     them only load in PVonHVM mode).

  The rest are just minor fixes in the various drivers and some cleanup
  in the core code."

* tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success
  xen/pciback: fix XEN_PCI_OP_enable_msix result
  xen/smp: Remove unnecessary call to smp_processor_id()
  xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'
  xen: only check xen_platform_pci_unplug if hvm
2012-04-06 17:54:53 -07:00
Linus Torvalds
1ddca05743 MMC fixes for 3.4-rc2:
The major fixes here are:
   * Disable use of MSI in sdhci-pci, which caused multiple chipsets to
     stop working in 3.4-rc1.  I'll wait to turn this on again until we
     have a chipset whitelist for it.
   * Fix a libertas SDIO powered-resume regression introduced in 3.3;
     thanks to Neil Brown and Rafael Wysocki for this fix.
   * Fix module reloading on omap_hsmmc.
   * Stop trusting the spec/card's specified maximum data timeout length,
     and use three seconds instead.  Previously we used 300ms.
 
  Also cleanups and fixes for s3c, atmel, sh_mmcif and omap_hsmmc.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPfzfxAAoJEHNBYZ7TNxYMGBoP/02Drw+n4c7xykXdB7SSloq8
 SGIAmh25EBFyZhcEmQ8Mm5qn4voHG8+Jm7ZXA3OipJ4xucu5fdoCIu/n8arGYa2u
 GRB+GYOz92maCNBwLGZIMm7A64m2rkrRdwTw4pkjOZuJLh28GCshK2E+ZWoMKVNU
 flaUna7NQc/p2eXulCNirry+4l1xxGW00lYaSOq78MTOKUMwWLCJZw+TUvfvxjX8
 5BFc3b2+5fQKc1M0qh23oEcneRdOUUK6dWzvZHuQso1SkDeB1cetN9fzrhnys9or
 tooIqd+GQRPJ6oGqe+VD8MT8HGMPb6oETZaFM7h7bt8nFcScu3iVjmJQLIbnsNBY
 ho8AUkMbZtDtRBWDtk11zM9qrOWGufRdB0qW4R+4Zik1JohNp6SAeV7YSllIT4sT
 lT5gdzAxAiZ992iwYKuwAn0SQOlevN2BjHMFGg8abZtlOWEWE2YIQ0nTJS2SCIyj
 qbM9wmLZ2Ymo9KabRY+huRgrbViNHDzAQOLB/LEyvbPwD97arJE1R/OVNEasiv52
 rh5pCsIa5oeUeX1fsfysfug+cO96Xm1oL/cGyLRj8WZB++nurHRm7ZAo36r8FLK1
 4LpMokEEwpoPcbNqsfFvOy3aRV4sGNTDj+U8B4zyq9GigSoUwSAK7Nn2qbY9zIWj
 W+oh3jfoyMbyVE5+00X7
 =CCKD
 -----END PGP SIGNATURE-----

Merge tag 'mmc-fixes-for-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc

Pull MMC fixes from Chris Ball:
 - Disable use of MSI in sdhci-pci, which caused multiple chipsets to
   stop working in 3.4-rc1.  I'll wait to turn this on again until we
   have a chipset whitelist for it.
 - Fix a libertas SDIO powered-resume regression introduced in 3.3;
   thanks to Neil Brown and Rafael Wysocki for this fix.
 - Fix module reloading on omap_hsmmc.
 - Stop trusting the spec/card's specified maximum data timeout length,
   and use three seconds instead.  Previously we used 300ms.

Also cleanups and fixes for s3c, atmel, sh_mmcif and omap_hsmmc.

* tag 'mmc-fixes-for-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (28 commits)
  mmc: use really long write timeout to deal with crappy cards
  mmc: sdhci-dove: Fix compile error by including module.h
  mmc: Prevent 1.8V switch for SD hosts that don't support UHS modes.
  Revert "mmc: sdhci-pci: Add MSI support"
  Revert "mmc: sdhci-pci: add quirks for broken MSI on O2Micro controllers"
  mmc: core: fix power class selection
  mmc: omap_hsmmc: fix module re-insertion
  mmc: omap_hsmmc: convert to module_platform_driver
  mmc: omap_hsmmc: make it behave well as a module
  mmc: omap_hsmmc: trivial cleanups
  mmc: omap_hsmmc: context save after enabling runtime pm
  mmc: omap_hsmmc: use runtime put sync in probe error patch
  mmc: sdio: Use empty system suspend/resume callbacks at the bus level
  mmc: bus: print bus speed mode of UHS-I card
  mmc: sdhci-pci: add quirks for broken MSI on O2Micro controllers
  mmc: sh_mmcif: Simplify calculation of mmc->f_min
  mmc: sh_mmcif: mmc->f_max should be half of the bus clock
  mmc: sh_mmcif: double clock speed
  mmc: block: Remove use of mmc_blk_set_blksize
  mmc: atmel-mci: add support for odd clock dividers
  ...
2012-04-06 17:22:23 -07:00
Linus Torvalds
f68e556e23 Make the "word-at-a-time" helper functions more commonly usable
I have a new optimized x86 "strncpy_from_user()" that will use these
same helper functions for all the same reasons the name lookup code uses
them.  This is preparation for that.

This moves them into an architecture-specific header file.  It's
architecture-specific for two reasons:

 - some of the functions are likely to want architecture-specific
   implementations.  Even if the current code happens to be "generic" in
   the sense that it should work on any little-endian machine, it's
   likely that the "multiply by a big constant and shift" implementation
   is less than optimal for an architecture that has a guaranteed fast
   bit count instruction, for example.

 - I expect that if architectures like sparc want to start playing
   around with this, we'll need to abstract out a few more details (in
   particular the actual unaligned accesses).  So we're likely to have
   more architecture-specific stuff if non-x86 architectures start using
   this.

   (and if it turns out that non-x86 architectures don't start using
   this, then having it in an architecture-specific header is still the
   right thing to do, of course)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-06 13:54:56 -07:00
Toshi Kani
ee01e66337 cpuidle: Fix panic in CPU off-lining with no idle driver
Fix a NULL pointer dereference panic in cpuidle_play_dead() during
CPU off-lining when no cpuidle driver is registered.  A cpuidle
driver may be registered at boot-time based on CPU type.  This patch
allows an off-lined CPU to enter HLT-based idle in this condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: Boris Ostrovsky <boris.ostrovsky@amd.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-04-06 15:01:25 -04:00
Linus Torvalds
23f347ef63 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) Fix inaccuracies in network driver interface documentation, from Ben
    Hutchings.

 2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.

 3) Compile warning, locking, and refcounting fixes in netfilter's
    xt_CT, from Pablo Neira Ayuso.

 4) phonet sendmsg needs to validate user length just like any other
    datagram protocol, fix from Sasha Levin.

 5) Ipv6 multicast code uses wrong loop index, from RongQing Li.

 6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
    and Yuval Mintz.

 7) mlx4 erroneously allocates 4 pages at a time, regardless of page
    size, fix from Thadeu Lima de Souza Cascardo.

 8) SCTP socket option wasn't extended in a backwards compatible way,
    fix from Thomas Graf.

 9) Add missing address change event emissions to bonding, from Shlomo
    Pongratz.

10) /proc/net/dev regressed because it uses a private offset to track
    where we are in the hash table, but this doesn't track the offset
    pullback that the seq_file code does resulting in some entries being
    missed in large dumps.

    Fix from Eric Dumazet.

11) do_tcp_sendpage() unloads the send queue way too fast, because it
    invokes tcp_push() when it shouldn't.  Let the natural sequence
    generated by the splice paths, and the assosciated MSG_MORE
    settings, guide the tcp_push() calls.

    Otherwise what goes out of TCP is spaghetti and doesn't batch
    effectively into GSO/TSO clusters.

    From Eric Dumazet.

12) Once we put a SKB into either the netlink receiver's queue or a
    socket error queue, it can be consumed and freed up, therefore we
    cannot touch it after queueing it like that.

    Fixes from Eric Dumazet.

13) PPP has this annoying behavior in that for every transmit call it
    immediately stops the TX queue, then calls down into the next layer
    to transmit the PPP frame.

    But if that next layer can take it immediately, it just un-stops the
    TX queue right before returning from the transmit method.

    Besides being useless work, it makes several facilities unusable, in
    particular things like the equalizers.  Well behaved devices should
    only stop the TX queue when they really are full, and in PPP's case
    when it gets backlogged to the downstream device.

    David Woodhouse therefore fixed PPP to not stop the TX queue until
    it's downstream can't take data any more.

14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
    changes, re-add.  From Marc Kleine-Budde.

15) Fix link flaps in ixgbe, from Eric W. Multanen.

16) Descriptor writeback fixes in e1000e from Matthew Vick.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  net: fix a race in sock_queue_err_skb()
  netlink: fix races after skb queueing
  doc, net: Update ndo_start_xmit return type and values
  doc, net: Remove instruction to set net_device::trans_start
  doc, net: Update netdev operation names
  doc, net: Update documentation of synchronisation for TX multiqueue
  doc, net: Remove obsolete reference to dev->poll
  ethtool: Remove exception to the requirement of holding RTNL lock
  MAINTAINERS: update for Marvell Ethernet drivers
  bonding: properly unset current_arp_slave on slave link up
  phonet: Check input from user before allocating
  tcp: tcp_sendpages() should call tcp_push() once
  ipv6: fix array index in ip6_mc_add_src()
  mlx4: allocate just enough pages instead of always 4 pages
  stmmac: re-add IFF_UNICAST_FLT for dwmac1000
  bnx2x: Clear MDC/MDIO warning message
  bnx2x: Fix BCM57711+BCM84823 link issue
  bnx2x: Clear BCM84833 LED after fan failure
  bnx2x: Fix BCM84833 PHY FW version presentation
  bnx2x: Fix link issue for BCM8727 boards.
  ...
2012-04-06 10:37:38 -07:00
Jan Beulich
f09d8432e3 xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success
The original XenoLinux code has always had things this way, and for
compatibility reasons (in particular with a subsequent pciback
adjustment) upstream Linux should behave the same way (allowing for two
distinct error indications to be returned by the backend).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06 12:16:02 -04:00
Jan Beulich
0ee46eca04 xen/pciback: fix XEN_PCI_OP_enable_msix result
Prior to 2.6.19 and as of 2.6.31, pci_enable_msix() can return a
positive value to indicate the number of vectors (less than the amount
requested) that can be set up for a given device. Returning this as an
operation value (secondary result) is fine, but (primary) operation
results are expected to be negative (error) or zero (success) according
to the protocol. With the frontend fixed to match the XenoLinux
behavior, the backend can now validly return zero (success) here,
passing the upper limit on the number of vectors in op->value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06 12:13:55 -04:00
Srivatsa S. Bhat
e8c9e788f4 xen/smp: Remove unnecessary call to smp_processor_id()
There is an extra and unnecessary call to smp_processor_id()
in cpu_bringup(). Remove it.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06 12:13:30 -04:00
Konrad Rzeszutek Wilk
2531d64b6f xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'
The above mentioned patch checks the IOAPIC and if it contains
-1, then it unmaps said IOAPIC. But under Xen we get this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
PGD 0
Oops: 0002 [#1] SMP
CPU 0
Modules linked in:

Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron
1525                  /0U990C
RIP: e030:[<ffffffff8134e51f>]  [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
RSP: e02b: ffff8800d42cbb70  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001
RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001
RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010
FS:  0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0:000000008005003b
CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000)
Stack:
 00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157
 ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384
 0000000000000002 0000000000000010 00000000ffffffff 0000000000000000
Call Trace:
 [<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230
 [<ffffffff8100a9b2>] ? check_events+0x12+0x20
 [<ffffffff814bab42>] xen_register_pirq+0x82/0xe0
 [<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0
 [<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30
 [<ffffffff8103036f>] acpi_register_gsi+0xf/0x20
 [<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202
 [<ffffffff814bc849>] pcibios_enable_device+0x39/0x40
 [<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70
 [<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0
 [<ffffffff812dc8d3>] pci_enable_device+0x13/0x20

The reason we are dying is b/c the call acpi_get_override_irq() is used,
which returns the polarity and trigger for the IRQs. That function calls
mp_find_ioapics to get the 'struct ioapic' structure - which along with the
mp_irq[x] is used to figure out the default values and the polarity/trigger
overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled
with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the
mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt.

The proper fix for this is going in v3.5 and adds an x86_io_apic_ops
struct so that platforms can override it. But for v3.4 lets carry this
work-around. This patch does that by providing a slightly different variant
of the fake IOAPIC entries.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06 12:13:06 -04:00
Igor Mammedov
e95ae5a493 xen: only check xen_platform_pci_unplug if hvm
commit b9136d207f08
  xen: initialize platform-pci even if xen_emul_unplug=never

breaks blkfront/netfront by not loading them because of
xen_platform_pci_unplug=0 and it is never set for PV guest.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-04-06 12:12:52 -04:00
Eric Dumazet
110c43304d net: fix a race in sock_queue_err_skb()
As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 05:07:21 -04:00
Eric Dumazet
4a7e7c2ad5 netlink: fix races after skb queueing
As soon as an skb is queued into socket receive_queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 04:21:06 -04:00
Ben Hutchings
e34fac1c2e doc, net: Update ndo_start_xmit return type and values
Commit dc1f8bf68b ('netdev: change
transmit to limited range type') changed the required return type and
9a1654ba0b ('net: Optimize
hard_start_xmit() return checking') changed the valid numerical
return values.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 02:43:13 -04:00
Ben Hutchings
de7aca16fd doc, net: Remove instruction to set net_device::trans_start
Commit 08baf56108 ('net:
txq_trans_update() helper') made it unnecessary for most drivers to
set net_device::trans_start (or netdev_queue::trans_start).

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 02:43:13 -04:00
Ben Hutchings
b3cf65457f doc, net: Update netdev operation names
Commits d314774cf2 ('netdev: network
device operations infrastructure') and
008298231a ('netdev: add more functions
to netdevice ops') moved and renamed net device operation pointers.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 02:43:12 -04:00
Ben Hutchings
04fd3d3515 doc, net: Update documentation of synchronisation for TX multiqueue
Commits e308a5d806 ('netdev: Add
netdev->addr_list_lock protection.') and
e8a0464cc9 ('netdev: Allocate multiple
queues for TX.') introduced more fine-grained locks.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 02:43:12 -04:00
Ben Hutchings
93b6a3adbd doc, net: Remove obsolete reference to dev->poll
Commit bea3348eef ('[NET]: Make NAPI
polling independent of struct net_device objects.') removed the
automatic disabling of NAPI polling by dev_close(), and drivers
must now do this themselves.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 02:43:12 -04:00
Ben Hutchings
b4f79e5cb2 ethtool: Remove exception to the requirement of holding RTNL lock
Commit e52ac3398c ('net: Use device
model to get driver name in skb_gso_segment()') removed the only
in-tree caller of ethtool ops that doesn't hold the RTNL lock.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 02:43:12 -04:00