Commit Graph

3375 Commits

Author SHA1 Message Date
Paolo Bonzini
61ec84f145 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
The highlights are:

* Enable VFIO device on PowerPC, from David Gibson
* Optimizations to speed up IPIs between vcpus in HV KVM,
  from Suresh Warrier (who is also Suresh E. Warrier)
* In-kernel handling of IOMMU hypercalls, and support for dynamic DMA
  windows (DDW), from Alexey Kardashevskiy.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03 14:36:07 +01:00
Aviv Greenberg
5d8d8db851 [media] UVC: Add support for R200 depth camera
Add support for Intel R200 depth camera in uvc driver.
This includes adding new uvc GUIDs for the new pixel formats,
adding new V4L pixel format definition to user api headers,
and updating the uvc driver GUID-to-4cc tables with the new formats.

Tested-by: Greenberg, Aviv D <aviv.d.greenberg@intel.com>
Signed-off-by: Aviv Greenberg <aviv.d.greenberg@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-03-03 06:49:20 -03:00
Hans Verkuil
87294e9db5 [media] media.h: use hex values for IF and AUDIO entities too
Make the base offset hexadecimal to simplify debugging since the base
addresses are hex too.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-03-03 06:10:14 -03:00
Mauro Carvalho Chehab
664716ad01 Merge branch 'v4l_for_linus' into patchwork
We need to import the changes at media.h, as we have a
followup patch that depends on it.

* v4l_for_linus:
  [media] media.h: use hex values for range offsets,  move connectors base up.
  [media] adv7604: fix tx 5v detect regression
2016-03-03 06:07:41 -03:00
Hans Verkuil
58402b6e98 [media] media.h: use hex values for range offsets, move connectors base up.
Make the base offset hexadecimal to simplify debugging since the base
addresses are hex too.

The offsets for connectors is also changed to start after the 'reserved'
range 0x10000-0x2ffff.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-03-03 05:56:33 -03:00
Pablo Neira Ayuso
8a6bf5da1a netfilter: nft_masq: support port range
Complete masquerading support by allowing port range selection.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-03-02 20:05:27 +01:00
Michael S. Tsirkin
592002f55e virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH
Latest virtio spec says the feature bit name is VIRTIO_BLK_F_FLUSH,
VIRTIO_BLK_F_WCE is the legacy name.  virtio blk header says exactly the
reverse - fix that and update driver code to match.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02 17:01:59 +02:00
Greg Kroah-Hartman
71e41bbb43 Merge 4.5-rc6 into usb-next
We want the USB fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01 16:13:54 -08:00
Greg Kroah-Hartman
3e66848a32 Merge 4.5-rc6 into staging-next
We want the staging fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-01 16:10:45 -08:00
Alexey Kardashevskiy
58ded4201f KVM: PPC: Add support for 64bit TCE windows
The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not
enough for directly mapped windows as the guest can get more than 4GB.

This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it
via KVM_CAP_SPAPR_TCE_64 capability. The table size is checked against
the locked memory limit.

Since 64bit windows are to support Dynamic DMA windows (DDW), let's add
@bus_offset and @page_shift which are also required by DDW.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-02 09:56:50 +11:00
Alexey Kardashevskiy
01d01d6919 KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_64 capability number
This adds a capability number for 64-bit TCE tables support.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-02 09:56:50 +11:00
Jamal Hadi Salim
ef6980b6be introduce IFE action
This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
  match 00000000/00000000 at 0 (success 0 )
        action order 1: ife decode action reclassify
         index 1 ref 1 bind 1 installed 14 sec used 14 sec
         type: 0x0
         Metadata: allow mark allow hash allow prio allow qmap
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

YYYY: Lets show the sender side now ..

xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:	ethertype 0xdead
xxx:	add skb->mark to whitelist of metadatum to send
xxx:	rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
sudo tc -s filter ls dev $ETH parent 1: protocol ip
xxx:
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 0 success 0)
  match c0a87aed/ffffffff at 16 (success 0 )
  match 00010000/00ff0000 at 8 (success 0 )

	action order 1:  skbedit mark 17
	 index 6 ref 1 bind 1
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: ife encode action pipe
	 index 3 ref 1 bind 1
	 dst MAC: 02:15:15:15:15:15 type: 0xDEAD
 	 Metadata: allow mark
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
xxx:

test by sending ping from sender to destination

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:15:22 -05:00
David S. Miller
d67703fced Merge tag 'mac80211-next-for-davem-2016-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
Here's another round of updates for -next:
 * big A-MSDU RX performance improvement (avoid linearize of paged RX)
 * rfkill changes: cleanups, documentation, platform properties
 * basic PBSS support in cfg80211
 * MU-MIMO action frame processing support
 * BlockAck reordering & duplicate detection offload support
 * various cleanups & little fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:03:27 -05:00
Nikolay Aleksandrov
59f78f9f6c bridge: mcast: add support for more router port information dumping
Allow for more multicast router port information to be dumped such as
timer and type attributes. For that that purpose we need to extend the
MDBA_ROUTER_PORT attribute similar to how it was done for the mdb entries
recently. The new format is thus:
[MDBA_ROUTER_PORT] = { <- nested attribute
    u32 ifindex <- router port ifindex for user-space compatibility
    [MDBA_ROUTER_PATTR attributes]
}
This way it remains compatible with older users (they'll simply retrieve
the u32 in the beginning) and new users can parse the remaining
attributes. It would also allow to add future extensions to the router
port without breaking compatibility.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov
a55d8246ab bridge: mcast: add support for temporary port router
Add support for a temporary router port which doesn't depend only on the
incoming query. It can be refreshed if set to the same value, which is
a no-op for the rest.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov
7f0aec7a66 bridge: mcast: use names for the different multicast_router types
Using raw values makes it difficult to extend and also understand the
code, give them names and do explicit per-option manipulation in
br_multicast_set_port_router.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:55:07 -05:00
Jiri Pirko
bfcd3a4661 Introduce devlink infrastructure
Introduce devlink infrastructure for drivers to register and expose to
userspace via generic Netlink interface.

There are two basic objects defined:
devlink - one instance for every "parent device", for example switch ASIC
devlink port - one instance for every physical port of the device.

This initial portion implements basic get/dump of objects to userspace.
Also, port splitter and port type setting is implemented.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:29 -05:00
John Fastabend
9e8ce79cd7 net: sched: cls_u32 add bit to specify software only rules
In the initial implementation the only way to stop a rule from being
inserted into the hardware table was via the device feature flag.
However this doesn't work well when working on an end host system
where packets are expect to hit both the hardware and software
datapaths.

For example we can imagine a rule that will match an IP address and
increment a field. If we install this rule in both hardware and
software we may increment the field twice. To date we have only
added support for the drop action so we have been able to ignore
these cases. But as we extend the action support we will hit this
example plus more such cases. Arguably these are not even corner
cases in many working systems these cases will be common.

To avoid forcing the driver to always abort (i.e. the above example)
this patch adds a flag to add a rule in software only. A careful
user can use this flag to build software and hardware datapaths
that work together. One example we have found particularly useful
is to use hardware resources to set the skb->mark on the skb when
the match may be expensive to run in software but a mark lookup
in a hash table is cheap. The idea here is hardware can do in one
lookup what the u32 classifier may need to traverse multiple lists
and hash tables to compute. The flag is only passed down on inserts.
On deletion to avoid stale references in hardware we always try
to remove a rule if it exists.

The flags field is part of the classifier specific options. Although
it is tempting to lift this into the generic structure doing this
proves difficult do to how the tc netlink attributes are implemented
along with how the dump/change routines are called. There is also
precedence for putting seemingly generic pieces in the specific
classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So
although not ideal I've left FLAGS in the u32 options as well as it
simplifies the code greatly and user space has already learned how
to manage these bits ala 'tc' tool.

Another thing if trying to update a rule we require the flags to
be unchanged. This is to force user space, software u32 and
the hardware u32 to keep in sync. Thanks to Simon Horman for
catching this case.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:05:39 -05:00
Andrzej Hajda
cbf0aceff8 drm/exynos: use arch independent types in uapi header
User API structs should not use types which size/alignment/padding depends
on architecture. The patch fixes it for all structures except
drm_exynos_g2d_userptr, as g2d related stuff seems to be more complicated
and will be reviewed/adjusted later.

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-03-01 23:37:24 +09:00
Dave Airlie
efcebcf983 Merge tag 'drm-intel-next-2016-02-14' of git://anongit.freedesktop.org/drm-intel into drm-next
- lots and lots of fbc work from Paulo
- max pixel clock checks from Mika Kahola
- prep work for nv12 offset handling from Ville
- piles of small fixes and refactorings all around

* tag 'drm-intel-next-2016-02-14' of git://anongit.freedesktop.org/drm-intel: (113 commits)
  drm/i915: Update DRIVER_DATE to 20160214
  drm/i915: edp resume/On time optimization.
  agp/intel-gtt: Only register fake agp driver for gen1
  drm/i915: TV pixel clock check
  drm/i915: CRT pixel clock check
  drm/i915: SDVO pixel clock check
  drm/i915: DisplayPort-MST pixel clock check
  drm/i915: HDMI pixel clock check
  drm/i915: DisplayPort pixel clock check
  drm/i915: check that rpm ref is held when accessing ringbuf in stolen mem
  drm/i915: fix error path in intel_setup_gmbus()
  drm/i915: Stop depending upon CONFIG_AGP_INTEL
  agp/intel-gtt: Don't leak the scratch page
  drm/i915: Capture PCI revision and subsytem details in error state
  drm/i915: fix context/engine cleanup order
  drm/i915: Handle PipeC fused off on IVB/HSW/BDW
  drm/i915/skl: Fix typo in DPLL_CFGCR1 definition
  drm/i915: Skip DDI PLL selection for DSI
  drm/i915/skl: Explicitly check for eDP in skl_ddi_pll_select()
  drm/i915/skl: Don't skip mst encoders in skl_ddi_pll_select()
  ...
2016-03-01 13:06:44 +10:00
Faisal Latif
7a43b598a3 i40iw: add entry in rdma_netlink
Add entry for port mapper services.

Changes since v2:
	moved this patch before being used

Changes since v1:
	moved I40IW as last element

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29 17:10:52 -05:00
Mitko Haralanov
0b091fb32c staging/hfi1: Enable TID caching feature
This commit "flips the switch" on the TID caching feature
implemented in this patch series.

As well as enabling the new feature by tying the new function
with the PSM API, it also cleans up the old unneeded code,
data structure members, and variables.

Due to difference in operation and information, the tracing
functions related to expected receives had to be changed. This
patch include these changes.

The tracing function changes could not be split into a separate
commit without including both tracing variants at the same time.
This would have caused other complications and ugliness.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29 17:10:39 -05:00
Mitko Haralanov
955ad36dcd uapi/hfi1_user: Add command and event for TID caching
TID caching will use a new event to signal userland that cache
invalidation has occurred and needs a matching command code that
will be used to read the invalidated TIDs.

Add the event bit and the new command to the exported header file.

The command is also added to the switch() statement in file_ops.c
for completeness and in preparation for its usage later.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29 17:10:38 -05:00
Mitko Haralanov
462075a6ea uapi/hfi1_user: Correct comment for capability bit
The HFI1_CAP_TID_UNMAP comment was incorrectly implying the
opposite of what capability actually did. Correct this error.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-29 17:10:38 -05:00
Shannon Zhao
f577f6c2a6 arm64: KVM: Introduce per-vcpu kvm device controls
In some cases it needs to get/set attributes specific to a vcpu and so
needs something else than ONE_REG.

Let's copy the KVM_DEVICE approach, and define the respective ioctls
for the vcpu file descriptor.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29 18:34:21 +00:00
Shannon Zhao
808e738142 arm64: KVM: Add a new feature bit for PMUv3
To support guest PMUv3, use one bit of the VCPU INIT feature array.
Initialize the PMU when initialzing the vcpu with that bit and PMU
overflow interrupt set.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29 18:34:21 +00:00
Takashi Iwai
eedf5e72c4 ALSA: seq: Remove unimplemented ioctls
SNDRV_SEQ_IOCTL_{GET|SET}_QUEUE_OWNER and *_{GET|SET}_QUEUE_SYNC
ioctls have been never implemented.  Get rid of the definitions from
uapi header file.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-29 18:09:37 +01:00
Florian Westphal
b07edbe1cf netfilter: meta: add PRANDOM support
Can be used to randomly match packets e.g. for statistic traffic sampling.

See commit 3ad0040573
("bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs")
for more info why this doesn't use prandom_u32 directly.

Unlike bpf nft_meta can be built as a module, so add an EXPORT_SYMBOL
for prandom_seed_full_state too.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-02-29 13:55:59 +01:00
Shuah Khan
7a2eba12ff [media] media: Add ALSA Media Controller function entities
Add ALSA Media Controller capture, playback, and mixer
function entity defines.

[mchehab@osg.samsung.com: fix a trivial merge conflict and start
 MEDIA_ENT_AUDIO_F from 3000]

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-02-27 08:32:54 -03:00
Shuah Khan
5af557a6d2 [media] uapi/media.h: Declare interface types for ALSA
Declare the interface types to be used on alsa for
the new G_TOPOLOGY ioctl.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-02-27 08:14:44 -03:00
David Decotigny
3f1ac7a700 net: ethtool: add new ETHTOOL_xLINKSETTINGS API
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings callbacks.
This API provides support for most legacy ethtool_cmd fields, adds
support for larger link mode masks (up to 4064 bits, variable length),
and removes ethtool_cmd deprecated
fields (transceiver/maxrxpkt/maxtxpkt).

This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides
the following backward compatibility properties:
 - legacy ethtool with legacy drivers: no change, still using the
   get_settings/set_settings callbacks.
 - legacy ethtool with new get/set_link_ksettings drivers: the new
   driver callbacks are used, data internally converted to legacy
   ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link
   mode mask. ETHTOOL_SSET will fail if user tries to set the
   ethtool_cmd deprecated fields to
   non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if
   driver sets higher bits.
 - future ethtool with legacy drivers: no change, still using the
   get_settings/set_settings callbacks, internally converted to new data
   structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be
   ignored and seen as 0 from user space. Note that that "future"
   ethtool tool will not allow changes to these deprecated fields.
 - future ethtool with new drivers: direct call to the new callbacks.

By "future" ethtool, what is meant is:
 - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if
   fails
 - set: query first and remember which of ETHTOOL_GLINKSETTINGS or
   ETHTOOL_GSET was successful
   + if ETHTOOL_GLINKSETTINGS was successful, then change config with
     ETHTOOL_SLINKSETTINGS. A failure there is final (do not try
     ETHTOOL_SSET).
   + otherwise ETHTOOL_GSET was successful, change config with
     ETHTOOL_SSET. A failure there is final (do not try
     ETHTOOL_SLINKSETTINGS).

The interaction user/kernel via the new API requires a small
ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link
mode bitmaps. If kernel doesn't agree with user, it returns the bitmap
length it is expecting from user as a negative length (and cmd field is
0). When kernel and user agree, kernel returns valid info in all
fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS).

Data structure crossing user/kernel boundary is 32/64-bit
agnostic. Converted internally to a legal kernel bitmap.

The internal __ethtool_get_settings kernel helper will gradually be
replaced by __ethtool_get_link_ksettings by the time the first
"link_settings" drivers start to appear. So this patch doesn't change
it, it will be removed before it needs to be changed.

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 22:06:45 -05:00
Tom Herbert
a87cb3e48e net: Facility to report route quality of connected sockets
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
purpose is to allow an application to give feedback to the kernel about
the quality of the network path for a connected socket. The value
argument indicates the type of quality report. For this initial patch
the only supported advice is a value of 1 which indicates "bad path,
please reroute"-- the action taken by the kernel is to call
dst_negative_advice which will attempt to choose a different ECMP route,
reset the TX hash for flow label and UDP source port in encapsulation,
etc.

This facility should be useful for connected UDP sockets where only the
application can provide any feedback about path quality. It could also
be useful for TCP applications that have additional knowledge about the
path outside of the normal TCP control loop.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 22:01:22 -05:00
David Ahern
f1705ec197 net: ipv6: Make address flushing on ifdown optional
Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    << nothing; all addresses have been flushed>>

Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the set global
addresses with no expire times are not flushed on an admin down. The sysctl
is per-interface or system-wide for all interfaces

    $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
or
    $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1

Will keep addresses on eth1 on an admin down.

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
        inet6 2100:1::2/120 scope global tentative
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
           valid_lft forever preferred_lft forever

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 21:45:15 -05:00
Linus Walleij
214338e372 gpio: present the consumer of a line to userspace
I named the field representing the current user of GPIO line as
"label" but this is too vague and ambiguous. Before anyone gets
confused, rename it to "consumer" and indicate clearly in the
documentation that this is a string set by the user of the line.

Also clean up leftovers in the documentation.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-02-25 21:07:23 +01:00
Daniel Borkmann
2da897e51d bpf: fix csum setting for bpf_set_tunnel_key
The fix in 35e2d1152b ("tunnels: Allow IPv6 UDP checksums to be correctly
controlled.") changed behavior for bpf_set_tunnel_key() when in use with
IPv6 and thus uncovered a bug that TUNNEL_CSUM needed to be set but wasn't.
As a result, the stack dropped ingress vxlan IPv6 packets, that have been
sent via eBPF through collect meta data mode due to checksum now being zero.

Since after LCO, we enable IPv4 checksum by default, so make that analogous
and only provide a flag BPF_F_ZERO_CSUM_TX for the user to turn it off in
IPv4 case.

Fixes: 35e2d1152b ("tunnels: Allow IPv6 UDP checksums to be correctly controlled.")
Fixes: c6c3345407 ("bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-24 16:23:47 -05:00
Beni Lev
0c9ca11b1a cfg80211: Add global RRM capability
Today, the supplicant will add the RRM capabilities
Information Element in the association request only if
Quiet period is supported (NL80211_FEATURE_QUIET).

Quiet is one of many RRM features, and there are other RRM
features that are not related to Quiet (e.g. neighbor
report). Therefore, requiring Quiet to enable RRM is too
restrictive.
Some of the features, like neighbor report, can be
supported by user space without any help from the kernel.
Hence adding the RRM capabilities IE to association request
should be the sole user space's decision.
Removing the RRM dependency on Quiet in the driver solves
this problem, but using an old driver with a user space
tool that would not require Quiet feature would be
problematic: the user space would add NL80211_ATTR_USE_RRM
in the association request even if the kernel doesn't
advertize NL80211_FEATURE_QUIET and the association would
be denied by the kernel.

This solution adds a global RRM capability, that tells user
space that it can request RRM capabilities IE publishment
without any specific feature support in the kernel.

Signed-off-by: Beni Lev <beni.lev@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-02-24 09:04:41 +01:00
Lior David
34d505193b cfg80211: basic support for PBSS network type
PBSS (Personal Basic Service Set) is a new BSS type for DMG
networks. It is similar to infrastructure BSS, having an AP-like
entity called PCP (PBSS Control Point), but it has few differences.
PBSS support is mandatory for 11ad devices.

Add support for PBSS by introducing a new PBSS flag attribute.
The PBSS flag is used in the START_AP command to request starting
a PCP instead of an AP, and in the CONNECT command to request
connecting to a PCP instead of an AP.

Signed-off-by: Lior David <liord@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-02-24 09:04:34 +01:00
João Paulo Rechi Vita
d4634e8dea rfkill: Update userspace API documentation
Add a note to userspace on the effect of RFKILL_OP_CHANGE_ALL also
updating the default state for hotplugged devices.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
[reword a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-02-24 09:04:25 +01:00
Dan Williams
4577b06655 nfit: update address range scrub commands to the acpi 6.1 format
The original format of these commands from the "NVDIMM DSM Interface
Example" [1] are superseded by the ACPI 6.1 definition of the "NVDIMM Root
Device _DSMs" [2].

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
[2]: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
     "9.20.7 NVDIMM Root Device _DSMs"

Changes include:
1/ New 'restart' fields in ars_status, unfortunately these are
   implemented in the middle of the existing definition so this change
   is not backwards compatible.  The expectation is that shipping
   platforms will only ever support the ACPI 6.1 definition.

2/ New status values for ars_start ('busy') and ars_status ('overflow').

Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-02-23 17:17:20 -08:00
Alex Williamson
f572a960a1 vfio/pci: Intel IGD host and LCP bridge config space access
Provide read-only access to PCI config space of the PCI host bridge
and LPC bridge through device specific regions.  This may be used to
configure a VM with matching register contents to satisfy driver
requirements.  Providing this through the vfio file descriptor removes
an additional userspace requirement for access through pci-sysfs and
removes the CAP_SYS_ADMIN requirement that doesn't appear to apply to
the specific devices we're accessing.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-22 16:10:09 -07:00
Alex Williamson
5846ff54e8 vfio/pci: Intel IGD OpRegion support
This is the first consumer of vfio device specific resource support,
providing read-only access to the OpRegion for Intel graphics devices.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-22 16:10:09 -07:00
Alex Williamson
c7bb4cb40f vfio: Define device specific region type capability
To this point vfio has only provided an interface to the user that
allows them to determine the number of regions and specifics about
each region.  What the region represents is left to the vfio bus
driver.  vfio-pci chooses to use fixed indexes for fixed resources,
index 0 is BAR0, 1 is BAR1,... 7 is config space, etc.  This works
pretty well since all PCI devices have these regions, even if they
don't necessarily populate all of them.  Then we start to add things
like VGA, which only certain device even support.  We added this the
same way, but now we've wasted a region index, and due to our offset
implementation the corresponding address space, for all devices.

Rather than continuing that process, let's try to make regions self
describing by including a capability that defines their type.  For
vfio-pci we'll make the current VFIO_PCI_NUM_REGIONS fixed, defining
the end of the static indexes and the beginning of self describing
regions.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-22 16:10:09 -07:00
Alex Williamson
ff63eb638d vfio: Define sparse mmap capability for regions
We can't always support mmap across an entire device region, for
example we deny mmaps covering the MSI-X table of PCI devices, but
we don't really have a way to report it.  We expect the user to
implicitly know this restriction.  We also can't split the region
because vfio-pci defines an API with fixed region index to BAR
number mapping.  We therefore define a new capability which lists
areas within the region that may be mmap'd.  In addition to the
MSI-X case, this potentially enables in-kernel emulation and
extensions to devices.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-22 16:10:08 -07:00
Alex Williamson
c84982adb2 vfio: Define capability chains
We have a few cases where we need to extend the data returned from the
INFO ioctls in VFIO.  For instance we already have devices exposed
through vfio-pci where VFIO_DEVICE_GET_REGION_INFO reports the region
as mmap-capable, but really only supports sparse mmaps, avoiding the
MSI-X table.  If we wanted to provide in-kernel emulation or extended
functionality for devices, we'd also want the ability to tell the
user not to mmap various regions, rather than forcing them to figure
it out on their own.

Another example is VFIO_IOMMU_GET_INFO.  We'd really like to expose
the actual IOVA capabilities of the IOMMU rather than letting the
user assume the address space they have available to them.  We could
add IOVA base and size fields to struct vfio_iommu_type1_info, but
what if we have multiple IOVA ranges.  For instance x86 uses a range
of addresses at 0xfee00000 for MSI vectors.  These typically are not
available for standard DMA IOVA mappings and splits our available IOVA
space into two ranges.  POWER systems have both an IOVA window below
4G as well as dynamic data window which they can use to remap all of
guest memory.

Representing variable sized arrays within a fixed structure makes it
very difficult to parse, we'd therefore like to put this data beyond
fixed fields within the data structures.  One way to do this is to
emulate capabilities in PCI configuration space.  A new flag indciates
whether capabilties are supported and a new fixed field reports the
offset of the first entry.  Users can then walk the chain to find
capabilities, adding capabilities does not require additional fields
in the fixed structure, and parsing variable sized data becomes
trivial.

This patch outlines the theory and base header structure, which
should be shared by all future users.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-22 16:10:08 -07:00
Daniel Borkmann
2f72959a9c bpf: fix csum update in bpf_l4_csum_replace helper for udp
When using this helper for updating UDP checksums, we need to extend
this in order to write CSUM_MANGLED_0 for csum computations that result
into 0 as sum. Reason we need this is because packets with a checksum
could otherwise become incorrectly marked as a packet without a checksum.
Likewise, if the user indicates BPF_F_MARK_MANGLED_0, then we should
not turn packets without a checksum into ones with a checksum.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21 22:07:10 -05:00
Daniel Borkmann
7d672345ed bpf: add generic bpf_csum_diff helper
For L4 checksums, we currently have bpf_l4_csum_replace() helper. It's
currently limited to handle 2 and 4 byte changes in a header and feeds the
from/to into inet_proto_csum_replace{2,4}() helpers of the kernel. When
working with IPv6, for example, this makes it rather cumbersome to deal
with, similarly when editing larger parts of a header.

Instead, extend the API in a more generic way: For bpf_l4_csum_replace(),
add a case for header field mask of 0 to change the checksum at a given
offset through inet_proto_csum_replace_by_diff(), and provide a helper
bpf_csum_diff() that can generically calculate a from/to diff for arbitrary
amounts of data.

This can be used in multiple ways: for the bpf_l4_csum_replace() only
part, this even provides us with the option to insert precalculated diffs
from user space f.e. from a map, or from bpf_csum_diff() during runtime.

bpf_csum_diff() has a optional from/to stack buffer input, so we can
calculate a diff by using a scratchbuffer for scenarios where we're
inserting (from is NULL), removing (to is NULL) or diffing (from/to buffers
don't need to be of equal size) data. Also, bpf_csum_diff() allows to
feed a previous csum into csum_partial(), so the function can also be
cascaded.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21 22:07:09 -05:00
Alexei Starovoitov
d5a3b1f691 bpf: introduce BPF_MAP_TYPE_STACK_TRACE
add new map type to store stack traces and corresponding helper
bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id
@ctx: struct pt_regs*
@map: pointer to stack_trace map
@flags: bits 0-7 - numer of stack frames to skip
        bit 8 - collect user stack instead of kernel
        bit 9 - compare stacks by hash only
        bit 10 - if two different stacks hash into the same stackid
                 discard old
        other bits - reserved
Return: >= 0 stackid on success or negative error

stackid is a 32-bit integer handle that can be further combined with
other data (including other stackid) and used as a key into maps.

Userspace will access stackmap using standard lookup/delete syscall commands to
retrieve full stack trace for given stackid.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20 00:21:44 -05:00
Kan Liang
ac2c7ad0e5 net/ethtool: introduce a new ioctl for per queue setting
Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting.
The following patches will enable some SUB_COMMANDs for per queue
setting.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19 22:54:09 -05:00
Laurent Pinchart
4c4400504f Merge remote-tracking branch 'linuxtv/vsp1' into HEAD 2016-02-20 02:58:43 +02:00
Nikolay Aleksandrov
2125715635 bridge: mdb: add support for more attributes and export timer
Currently mdb entries are exported directly as a structure inside
MDBA_MDB_ENTRY_INFO attribute, we can't really extend it without
breaking user-space. In order to export new mdb fields, I've converted
the MDBA_MDB_ENTRY_INFO into a nested attribute which starts like before
with struct br_mdb_entry (without header, as it's casted directly in
iproute2) and continues with MDBA_MDB_EATTR_ attributes. This way we
keep compatibility with older users and can export new data.
I've tested this with iproute2, both with and without support for the
added attribute and it works fine.
So basically we again have MDBA_MDB_ENTRY_INFO with struct br_mdb_entry
inside but it may contain also some additional MDBA_MDB_EATTR_ attributes
such as MDBA_MDB_EATTR_TIMER which can be parsed by user-space.

So the new structure is:
[MDBA_MDB] = {
     [MDBA_MDB_ENTRY] = {
         [MDBA_MDB_ENTRY_INFO]
         [MDBA_MDB_ENTRY_INFO] { <- Nested attribute
             struct br_mdb_entry <- nla_put_nohdr()
             [MDBA_MDB_ENTRY attributes] <- normal netlink attributes
         }
     }
}

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19 15:27:36 -05:00