Commit Graph

518008 Commits

Author SHA1 Message Date
Ilya Dryomov
9571eb4f96 libceph: simplify our debugfs attr macro
No need to do single_open()'s job ourselves.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:39 +03:00
Ilya Dryomov
ff7eeb82cc ceph: show non-default options only
Don't pollute /proc/mounts with default options (presently these are
dcache, nofsc and acl).  Leave the acl/noacl however - it's a bit of
a special case due to CONFIG_CEPH_FS_POSIX_ACL.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:39 +03:00
Ilya Dryomov
5cf7bd3012 libceph: expose client options through debugfs
Add a client_options attribute for showing libceph options.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:39 +03:00
Ilya Dryomov
ff40f9ae95 libceph, ceph: split ceph_show_options()
Split ceph_show_options() into two pieces and move the piece
responsible for printing client (libceph) options into net/ceph.  This
way people adding a libceph option wouldn't have to remember to update
code in fs/ceph.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:38 +03:00
Ilya Dryomov
d8a2c89c86 rbd: mark block queue as non-rotational
Set QUEUE_FLAG_NONROT.  Following commit b277da0a8a ("block: disable
entropy contributions for nonrot devices") we should also clear
QUEUE_FLAG_ADD_RANDOM, but it's off by default for blk-mq drivers, so
just note it in the comment.

Also remove physical block size assignment - no sense in repeating
defaults that are not going to change.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:38 +03:00
Ilya Dryomov
67c64eb742 libceph: don't overwrite specific con error msgs
- specific con->error_msg messages (e.g. "protocol version mismatch")
  end up getting overwritten by a catch-all "socket error on read
  / write", introduced in commit 3a140a0d5c ("libceph: report socket
  read/write error message")
- "bad message sequence # for incoming message" loses to "bad crc" due
  to the fact that -EBADMSG is used for both

Fix it, and tidy up con->error_msg assignments and pr_errs while at it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:37 +03:00
Yan, Zheng
1c841a96b5 ceph: cleanup unsafe requests when reconnecting is denied
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:37 +03:00
Yan, Zheng
a9f6eb6185 ceph: don't zero i_wrbuffer_ref when reconnecting is denied
remove_session_caps_cb() does not truncate dirty data in page
cache, but zeros i_wrbuffer_ref/i_wrbuffer_ref_head. This will
result negtive i_wrbuffer_ref/i_wrbuffer_ref_head

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:36 +03:00
Yan, Zheng
571ade336a ceph: don't mark dirty caps when there is no auth cap
No i_auth_cap means reconnecting to MDS was denied. So don't
add new dirty caps.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:36 +03:00
Yan, Zheng
db40cc1702 ceph: keep i_snap_realm while there are writers
when reconnecting to MDS is denied, we remove session caps
forcibly. But it's possible there are ongoing write, the
write code needs to reference i_snap_realm. So if there are
ongoing write, we keep i_snap_realm.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:35 +03:00
Joe Perches
3ef650d398 libceph: osdmap.h: Add missing format newlines
To avoid possible interleaving, add missing '\n' to formats.

Convert pr_warning to pr_warn while there.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:35 +03:00
Sanidhya Kashyap
a149bb9a28 ceph: kstrdup() memory handling
Currently, there is no check for the kstrdup() for r_path2,
r_path1 and snapdir_name as various locations as there is a
possibility of failure during memory pressure. Therefore,
returning ENOMEM where the checks have been missed.

Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:34 +03:00
Taesoo Kim
c1d00b2d9c ceph: properly release page upon error
When ceph_update_writeable_page fails (including -EAGAIN), it
unlocks (w/ unlock_page) the page but does not 'release'
(w/ page_cache_release) properly.

Upon error, properly set *pagep to NULL, indicating an error.

Signed-off-by: Taesoo Kim <tsgatesv@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 18:55:34 +03:00
Ilya Dryomov
1fe480235a rbd: be more informative on -ENOENT failures
pr_info what exactly was the culprit: missing pool, image or snap.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-20 18:55:33 +03:00
Olof Johansson
cdaa8cf348 Merge branch 'fixes' into next/fixes-non-critical
Merge a set of fixes that we missed sending in before v4.0 release. These
will also be sent to -stable.

* fixes: (659 commits)
  ARM: at91/dt: sama5d3 xplained: add phy address for macb1
  kbuild: Create directory for target DTB
  ARM: mvebu: Disable CPU Idle on Armada 38x
  arm64: juno: Fix misleading name of UART reference clock
  ARM: dts: sunxi: Remove overclocked/overvoltaged OPP
  ARM: dts: sun4i: a10-lime: Override and remove 1008MHz OPP setting
  ARM: socfpga: dts: fix spi1 interrupt
  ARM: dts: Fix gpio interrupts for dm816x
  ARM: dts: dra7: remove ti,hwmod property from pcie phy
  ARM: EXYNOS: Fix build breakage cpuidle on !SMP
  ARM: OMAP: dmtimer: disable pm runtime on remove
  ARM: OMAP: dmtimer: check for pm_runtime_get_sync() failure
  ARM: dts: fix lid and power pin-functions for exynos5250-spring
  ARM: dts: fix mmc node updates for exynos5250-spring
  ARM: OMAP2+: Fix socbus family info for AM33xx devices
  ARM: dts: omap3: Add missing dmas for crypto
  + Linux 4.0-rc4

Signed-off-by: Olof Johansson <olof@lixom.net>
2015-04-20 07:59:04 -07:00
Nicholas Mc Guire
57e95460f0 ceph: match wait_for_completion_timeout return type
return type of wait_for_completion_timeout is unsigned long not int. An
appropriately named unsigned long is added and the assignment fixed up.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:23 +03:00
Nicholas Mc Guire
3563dbdd99 ceph: use msecs_to_jiffies for time conversion
This is only an API consolidation and should make things more readable
it replaces var * HZ / 1000 by msecs_to_jiffies(var).

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
Fabian Frederick
e1eba3ea02 ceph: remove redundant declaration
ceph_aops was already defined extern in addr.c section

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
Yan, Zheng
e2c3de046c ceph: fix dcache/nocache mount option
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
Yan, Zheng
6e6f09231a ceph: drop cap releases in requests composed before cap reconnect
These cap releases are stale because MDS will re-establish client
caps according to the cap reconnect messages.

Note: MDS can detect stale cap messages, so these stale cap
releases are harmless even we don't drop them.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-04-20 17:30:22 +03:00
Tomi Valkeinen
6b75b54c84 Merge omapdss topic branch for fbdev 4.1 2015-04-20 12:09:31 +03:00
Grygorii Strashko
aa977f62df omapdss: extend pm notifier to handle hibernation
Add handling of missed events in omap_dss_pm_notif which are
needed to support hibernation (suspend to disk).

Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2015-04-20 12:09:04 +03:00
Javier Martinez Canillas
34260a79b2 OMAPDSS: Correct video ports description file path in DT binding doc
The doc refers to Documentation/devicetree/bindings/video/video-ports.txt
which does not exist. The documentation seems to be outdated and wants to
refer to Documentation/devicetree/bindings/graph.txt instead.

Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2015-04-20 12:09:04 +03:00
Tomi Valkeinen
cb17a4ae3b OMAPDSS: disable VT switch
We don't need VT switch when suspending/resuming, so disable it. This
speeds up suspend/resume.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: NeilBrown <neil@brown.name>
2015-04-20 12:09:04 +03:00
Dave Airlie
2c33ce009c Merge Linus master into drm-next
The merge is clean, but the arm build fails afterwards,
due to API changes in the regulator tree.

I've included the patch into the merge to fix the build.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-20 13:05:20 +10:00
Philipp Zabel
cec32a4701 media-bus: Fixup RGB444_1X12, RGB565_1X16, and YUV8_1X24 media bus format
Change the constant values for RGB444_1X12, RGB565_1X16, and YUV8_1X24 media
bus formats in anticipation of a merge conflict with the media tree, where
the old values are already taken by RBG888_1X24, RGB888_1X32_PADHI, and
VUY8_1X24, respectively.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-20 11:23:56 +10:00
Linus Torvalds
09d51602cf Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat update from Len Brown:
 "Updates to the turbostat utility.

  Just one kernel dependency in this batch -- added a #define to
  msr-index.h"

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: correct dumped pkg-cstate-limit value
  tools/power turbostat: calculate TSC frequency from CPUID(0x15) on SKL
  tools/power turbostat: correct DRAM RAPL units on recent Xeon processors
  tools/power turbostat: Initial Skylake support
  tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile
  tools/power turbostat: modprobe msr, if needed
  tools/power turbostat: dump MSR_TURBO_RATIO_LIMIT2
  tools/power turbostat: use new MSR_TURBO_RATIO_LIMIT names
  x86 msr-index: define MSR_TURBO_RATIO_LIMIT,1,2
  tools/power turbostat: label base frequency
  tools/power turbostat: update PERF_LIMIT_REASONS decoding
  tools/power turbostat: simplify default output
2015-04-19 14:31:41 -07:00
Linus Torvalds
6162e4b0be A few bug fixes and add support for file-system level encryption in ext4.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJVMvVGAAoJEPL5WVaVDYGjjZgH/0Z4bdtQpuQKAd2EoSUhiOh4
 tReqE1IuTU+urrL9qNA4qUFhAKq0Iju0INrnoYNb1+YxZ2myvUrMY4y2GkapaKgZ
 SFYL8LTS7E79/LuR6q1SFmUYoXCjqpWeHb7rAZ9OluSNQhke8SWdywLnp/0q05Go
 6SDwYdT8trxGED/wYTGPy9zMHcYEYHqIIvfFZd3eYtRnaP42Zo5rUvISg3cP0ekG
 LiX2D9Bi9pyqxgMjTG0+0xiC3ohTfXOujyHbnLVQ7kdZmpzZKfQspoczEIUolYb4
 /Ic4qPQQdbtjooQ7uRYUOFXeVjt7HZuTb3aVmh90RWrEhsLsyBmNd9StLFVdlcg=
 =9f7Z
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "A few bug fixes and add support for file-system level encryption in
  ext4"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits)
  ext4 crypto: enable encryption feature flag
  ext4 crypto: add symlink encryption
  ext4 crypto: enable filename encryption
  ext4 crypto: filename encryption modifications
  ext4 crypto: partial update to namei.c for fname crypto
  ext4 crypto: insert encrypted filenames into a leaf directory block
  ext4 crypto: teach ext4_htree_store_dirent() to store decrypted filenames
  ext4 crypto: filename encryption facilities
  ext4 crypto: implement the ext4 decryption read path
  ext4 crypto: implement the ext4 encryption write path
  ext4 crypto: inherit encryption policies on inode and directory create
  ext4 crypto: enforce context consistency
  ext4 crypto: add encryption key management facilities
  ext4 crypto: add ext4 encryption facilities
  ext4 crypto: add encryption policy and password salt support
  ext4 crypto: add encryption xattr support
  ext4 crypto: export ext4_empty_dir()
  ext4 crypto: add ext4 encryption Kconfig
  ext4 crypto: reserve codepoints used by the ext4 encryption feature
  ext4 crypto: add ext4_mpage_readpages()
  ...
2015-04-19 14:26:31 -07:00
Linus Torvalds
17974c054d hexdump: avoid warning in test function
The test_data_1_le[] array is a const array of const char *.  To avoid
dropping any const information, we need to use "const char * const *",
not just "const char **".

I'm not sure why the different test arrays end up having different
const'ness, but let's make the pointer we use to traverse them as const
as possible, since we modify neither the array of pointers _or_ the
pointers we find in the array.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-19 13:48:40 -07:00
Jann Horn
8b01fc86b9 fs: take i_mutex during prepare_binprm for set[ug]id executables
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-19 13:46:21 -07:00
Linus Torvalds
5224b9613b smp: Fix error case handling in smp_call_function_*()
Commit 8053871d0f ("smp: Fix smp_call_function_single_async()
locking") fixed the locking for the asynchronous smp-call case, but in
the process of moving the lock handling around, one of the error cases
ended up not unlocking the call data at all.

This went unnoticed on x86, because this is a "caller is buggy" case,
where the caller is trying to call a non-existent CPU.  But apparently
ARM does that (at least under qemu-arm).  Bindly doing cross-cpu calls
to random CPU's that aren't even online seems a bit fishy, but the error
handling was clearly not correct.

Simply add the missing "csd_unlock()" to the error path.

Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Analyzed-by: Rabin Vincent <rabin@rab.in>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-19 13:19:23 -07:00
Rusty Russell
e4afa120c9 cpumask: remove __first_cpu / __next_cpu
They were for use by the deprecated first_cpu() and next_cpu() wrappers,
but sparc used them directly.

They're now replaced by cpumask_first / cpumask_next.  And __next_cpu_nr
is completely obsolete.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
2015-04-19 14:35:32 +09:30
Linus Torvalds
64fb1d0e97 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller
 "Unfortunately, I brown paper bagged the generic iommu pool allocator
  by applying the wrong revision of the patch series.

  This reverts the bad one, and puts the right one in"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  iommu-common: Fix PARISC compile-time warnings
  sparc: Make LDC use common iommu poll management functions
  sparc: Make sparc64 use scalable lib/iommu-common.c functions
  Break up monolithic iommu table/lock into finer graularity pools and lock
  sparc: Revert generic IOMMU allocator.
2015-04-18 18:01:29 -04:00
Linus Torvalds
dba94f2155 9p: patches for 4.1 merge window
Some accumulated cleanup patches for kerneldoc and unused variables
 as well as some lock bug fixes and adding privateport option for RDMA.
 
 A quick check shows some merge-conflicts versus current-tip on
    9p: use unsigned integers for nwqid/count
 If you would prefer I can rebase, remerge and fix the patch but didn't
 want to do that and look the for-next references.
 
 Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJVMqXZAAoJEDZk62b0Tg6xBD4P/03nCkTxE5qDN9TVUSNdwHQD
 Oyq3JvvmfOORDHy7pZMp7wTdU4OLz+78RHYprpgJCk4Vs8Gcnl3hloeZ3L9l/W7J
 tz2Ek1noEE9uZLmeH6WPzSaba0sFOlnjbWPsLE8O84/zHOI/qj75s0UDPdrFRt1x
 LvMNQlTZqgUx0hogq1yLFKjp49bUzph78gMaJkoKK+30q9B4skPRRV93HLLzlo9j
 0dAGd0yhO8xUjtlm/ZkXIKiyeGeQ2XXj6UTnH6/4nwL29yVosWkGNjqIXkgz+ROu
 eyPvJqrjaBVtj8ZJkwfyZqM6xPrnsEbuSYUKLT2GcId87Ycebd7Wq1w+vhAO7l0H
 N1ZnzMGlQXHTszEhDGVCICCv1QU8b3ifvtA+nQYUly9JnDeIBcZGQ16g0oYQNoes
 1L6XKsrX4wdxROHYLqRJoNQ120KcaXAnRE3AmT8emiU8gl0KWW0TJ7WpLs9ICKRg
 cwgz1UzeGb/GGRtCv0gTlAE07fe/OjQVrSM3Q+ivTA+juRE2MWvluYh/WAMQHdFV
 FnJ5/sPKbcGK+IrHNWktkTLm2ZbbdcDnWHLmtk3egT3IubY5iLVpa5ADV47WsLAa
 viDp7N3mK0kZL8BJHgPs+aspRwMAHavme/EWzkuRTL048ABo8uTrM/BXiYsAaBBI
 GGh4+vEwcFDQdg2gMbF9
 =2sr2
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9pfs updates from Eric Van Hensbergen:
 "Some accumulated cleanup patches for kerneldoc and unused variables as
  well as some lock bug fixes and adding privateport option for RDMA"

* tag 'for-linus-4.1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  net/9p: add a privport option for RDMA transport.
  fs/9p: Initialize status in v9fs_file_do_lock.
  net/9p: Initialize opts->privport as it should be.
  net/9p: use memcpy() instead of snprintf() in p9_mount_tag_show()
  9p: use unsigned integers for nwqid/count
  9p: do not crash on unknown lock status code
  9p: fix error handling in v9fs_file_do_lock
  9p: remove unused variable in p9_fd_create()
  9p: kerneldoc warning fixes
2015-04-18 17:45:30 -04:00
David S. Miller
ccb301862a Merge branch 'iommu-generic-allocator'
Sowmini Varadhan says:

====================
Generic IOMMU pooled allocator

Investigation of network performance on Sparc shows a high
degree of locking contention in the IOMMU allocator, and it
was noticed that the PowerPC code has a better locking model.

This patch series tries to extract the generic parts of the
PowerPC code so that it can be shared across multiple PCI
devices and architectures.

v10: resend patchv9 without RFC tag, and a new mail Message-Id,
(previous non-RFC attempt did not show up on the patchwork queue?)

Full revision history below:
v2 changes:
  - incorporate David Miller editorial comments: sparc specific
    fields moved from iommu-common into sparc's iommu_64.h
  - make the npools value an input parameter, for the case when
    the iommu map size is not very large
  - cookie_to_index mapping, and optimizations for span-boundary
    check, for use case such as LDC.

v3: eliminate iommu_sparc, rearrange the ->demap indirection to
    be invoked under the pool lock.

v4: David Miller review changes:
  - s/IOMMU_ERROR_CODE/DMA_ERROR_CODE
  - page_table_map_base and page_table_shift are unsigned long, not u32.

v5: removed ->cookie_to_index and ->demap indirection from the
    iommu_tbl_ops The caller needs to call these functions as needed,
    before invoking the generic arena allocator functions.
    Added the "skip_span_boundary" argument to iommu_tbl_pool_init() for
    those callers like LDC which do no care about span boundary checks.

v6: removed iommu_tbl_ops, and instead pass the ->flush_all as
    an indirection to iommu_tbl_pool_init(); only invoke ->flush_all
    when there is no large_pool, based on the assumption that large-pool
    usage is infrequently encountered

v7: moved pool_hash initialization to lib/iommu-common.c and cleaned up
    code duplication from sun4v/sun4u/ldc.

v8: Addresses BenH comments with one exception: I've left the
    IOMMU_POOL_HASH as is, so that powerpc can tailor it to their
    convenience.  Discard trylock for simple spin_lock to acquire pool

v9: Addresses latest BenH comments: need_flush checks, add support
    for dma mask and align_order.

v10: resend without RFC tag, and new mail Message-Id.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18 12:35:09 -07:00
Sowmini Varadhan
2f0c0fdc08 iommu-common: Fix PARISC compile-time warnings
Fixes warnings due to
- no DMA_ERROR_CODE on PARISC,
- sizeof (unsigned long) == 4 bytes on PARISC.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18 12:34:50 -07:00
Sowmini Varadhan
0ae53ed15d sparc: Make LDC use common iommu poll management functions
Note that this conversion is only being done to consolidate the
code and ensure that the common code provides the sufficient
abstraction. It is not expected to result in any noticeable
performance improvement, as there is typically one ldc_iommu
per vnet_port, and each one has 8k entries, with a typical
request for 1-4 pages.  Thus LDC uses npools == 1.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18 12:32:59 -07:00
Sowmini Varadhan
bb620c3d39 sparc: Make sparc64 use scalable lib/iommu-common.c functions
In iperf experiments running linux as the Tx side (TCP client) with
10 threads results in a severe performance drop when TSO is disabled,
indicating a weakness in the software that can be avoided by using
the scalable IOMMU arena DMA allocation.

Baseline numbers before this patch:
   with default settings (TSO enabled) :    9-9.5 Gbps
   Disable TSO using ethtool- drops badly:  2-3 Gbps.

After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18 12:32:59 -07:00
Sowmini Varadhan
ff7d37a502 Break up monolithic iommu table/lock into finer graularity pools and lock
Investigation of multithreaded iperf experiments on an ethernet
interface show the iommu->lock as the hottest lock identified by
lockstat, with something of the order of  21M contentions out of
27M acquisitions, and an average wait time of 26 us for the lock.
This is not efficient. A more scalable design is to follow the ppc
model, where the iommu_map_table has multiple pools, each stretching
over a segment of the map, and with a separate lock for each pool.
This model allows for better parallelization of the iommu map search.

This patch adds the iommu range alloc/free function infrastructure.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18 12:32:59 -07:00
David S. Miller
c12f048ffd sparc: Revert generic IOMMU allocator.
I applied the wrong version of this patch series, V4 instead
of V10, due to a patchwork bundling snafu.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18 12:31:25 -07:00
Len Brown
e9257f5fa4 tools/power turbostat: correct dumped pkg-cstate-limit value
HSW expanded MSR_PKG_CST_CONFIG_CONTROL.Package-C-State-Limit,
from bits[2:0] used by previous implementations, to [3:0].
The value 1000b is unlimited, and is used by BDW and SKL too.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:52 -04:00
Len Brown
8a5bdf41d2 tools/power turbostat: calculate TSC frequency from CPUID(0x15) on SKL
turbostat --debug
...
CPUID(0x15): eax_crystal: 2 ebx_tsc: 100 ecx_crystal_hz: 0
TSC: 1200 MHz (24000000 Hz * 100 / 2 / 1000000)

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:52 -04:00
Andrey Semin
40ee8e3b9d tools/power turbostat: correct DRAM RAPL units on recent Xeon processors
While not yet documented in the Software Developer's Manual,
the data-sheet for modern Xeon states that DRAM RAPL ENERGY units
are fixed at 15.3 uJ, rather than being discovered via MSR.

Before this patch, DRAM energy on these products is over-stated by turbostat
because the RAPL units are 4x larger.

ref: "Xeon E5-2600 v3/E5-1600 v3 Datasheet Volume 2"
http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Signed-off-by: Andrey Semin <andrey.semin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:52 -04:00
Len Brown
0b2bb6925e tools/power turbostat: Initial Skylake support
Skylake adds some additional residency counters.

Skylake supports a different mix of RAPL registers
from any previous product.

In most other ways, Skylake is like Broadwell.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:51 -04:00
Thomas D
f82263c698 tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile
Since commit ee0778a301
("tools/power: turbostat: make Makefile a bit more capable")
turbostat's Makefile is using

  [...]
  BUILD_OUTPUT    := $(PWD)
  [...]

which obviously causes trouble when building "turbostat" with

  make -C /usr/src/linux/tools/power/x86/turbostat ARCH=x86 turbostat

because GNU make does not update nor guarantee that $PWD is set.

This patch changes the Makefile to use $CURDIR instead, which GNU make
guarantees to set and update (i.e. when using "make -C ...") and also
adds support for the O= option (see "make help" in your root of your
kernel source tree for more details).

Link: https://bugs.gentoo.org/show_bug.cgi?id=533918
Fixes: ee0778a301 ("tools/power: turbostat: make Makefile a bit more capable")
Signed-off-by: Thomas D. <whissi@whissi.de>
Cc: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:51 -04:00
Len Brown
a21d38c846 tools/power turbostat: modprobe msr, if needed
Some distros (Ubuntu) ship the msr driver as a module.
If turbosat is run as root on those systems, and discovers
that there is no /dev/cpu/cpu0/msr, it will now "modprobe msr"
for the user.

If not root, the modprobe attempt will fail, and turbostat will exit as before:

turbostat: no /dev/cpu/0/msr, Try "# modprobe msr" : No such file or directory

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:51 -04:00
Len Brown
fcd17211bd tools/power turbostat: dump MSR_TURBO_RATIO_LIMIT2
and up to 18 cores of turbo ratio limit
when using the turbostat --debug option.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:50 -04:00
Len Brown
12bb43c615 tools/power turbostat: use new MSR_TURBO_RATIO_LIMIT names
s/MSR_NHM_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT/
s/MSR_IVT_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT1/

syntax only -- use the documented strings describing these registers.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:50 -04:00
Linus Torvalds
34a984f7b0 Merge branch 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull PMEM driver from Ingo Molnar:
 "This is the initial support for the pmem block device driver:
  persistent non-volatile memory space mapped into the system's physical
  memory space as large physical memory regions.

  The driver is based on Intel code, written by Ross Zwisler, with fixes
  by Boaz Harrosh, integrated with x86 e820 memory resource management
  and tidied up by Christoph Hellwig.

  Note that there were two other separate pmem driver submissions to
  lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are
  reasonably happy with this initial version.

  This version enables minimal support that enables persistent memory
  devices out in the wild to work as block devices, identified through a
  magic (non-standard) e820 flag and auto-discovered if
  CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the
  memory maps via the "memmap=..." boot option with the new, special '!'
  modifier character.

  Limitations: this is a regular block device, and since the pmem areas
  are not struct page backed, they are invisible to the rest of the
  system (other than the block IO device), so direct IO to/from pmem
  areas, direct mmap() or XIP is not possible yet.  The page cache will
  also shadow and double buffer pmem contents, etc.

  Initial support is for x86"

* 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  drivers/block/pmem: Fix 32-bit build warning in pmem_alloc()
  drivers/block/pmem: Add a driver for persistent memory
  x86/mm: Add support for the non-standard protected e820 type
2015-04-18 11:42:49 -04:00
Linus Torvalds
90d1c08786 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "This tree includes:

   - an FPU related crash fix

   - a ptrace fix (with matching testcase in tools/testing/selftests/)

   - an x86 Kconfig DMA-config defaults tweak to better avoid
     non-working drivers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected
  x86/fpu: Load xsave pointer *after* initialization
  x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal()
  x86, selftests: Add single_step_syscall test
2015-04-18 11:31:11 -04:00