Commit Graph

36499 Commits

Author SHA1 Message Date
Al Viro
7ae9abfd9d ipv4: raw_send_hdrinc(): pass msghdr
Switch from passing msg->iov_iter.iov to passing msg itself

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:13 -05:00
Al Viro
a8866ff6a5 netlink: make the check for "send from tx_ring" deterministic
As it is, zero msg_iovlen means that the first iovec in the kernel
array of iovecs is left uninitialized, so checking if its ->iov_base
is NULL is random.  Since the real users of that thing are doing
sendto(fd, NULL, 0, ...), they are getting msg_iovlen = 1 and
msg_iov[0] = {NULL, 0}, which is what this test is trying to catch.
As suggested by davem, let's just check that msg_iovlen was 1 and
msg_iov[0].iov_base was NULL - _that_ is well-defined and it catches
what we want to catch.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:13 -05:00
Markus Elfring
4de46d5ebc netlabel: Less function calls in netlbl_mgmt_add_common() after error detection
The functions "cipso_v4_doi_putdef" and "kfree" could be called in some cases
by the netlbl_mgmt_add_common() function during error handling even if the
passed variables contained still a null pointer.

* This implementation detail could be improved by adjustments for jump labels.

* Let us return immediately after the first failed function call according to
  the current Linux coding style convention.

* Let us delete also an unnecessary check for the variable "entry" there.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03 16:22:13 -08:00
Markus Elfring
7a11b1d303 netlabel: Deletion of an unnecessary check before the function call "cipso_v4_doi_free"
The cipso_v4_doi_free() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03 16:22:12 -08:00
Markus Elfring
79b7cf60e1 netlabel: Deletion of an unnecessary check before the function call "cipso_v4_doi_putdef"
The cipso_v4_doi_putdef() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03 16:22:12 -08:00
Trond Myklebust
03a9a42a1a SUNRPC: NULL utsname dereference on NFS umount during namespace cleanup
Fix an Oopsable condition when nsm_mon_unmon is called as part of the
namespace cleanup, which now apparently happens after the utsname
has been freed.

Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home
Reported-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: stable@vger.kernel.org # 3.18
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-03 16:40:17 -05:00
Trond Myklebust
e2c63e091e Merge branch 'flexfiles'
* flexfiles: (53 commits)
  pnfs: lookup new lseg at lseg boundary
  nfs41: .init_read and .init_write can be called with valid pg_lseg
  pnfs: Update documentation on the Layout Drivers
  pnfs/flexfiles: Add the FlexFile Layout Driver
  nfs: count DIO good bytes correctly with mirroring
  nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
  nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
  nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
  nfs/flexfiles: send layoutreturn before freeing lseg
  nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
  nfs41: allow async version layoutreturn
  nfs41: add range to layoutreturn args
  pnfs: allow LD to ask to resend read through pnfs
  nfs: add nfs_pgio_current_mirror helper
  nfs: only reset desc->pg_mirror_idx when mirroring is supported
  nfs41: add a debug warning if we destroy an unempty layout
  pnfs: fail comparison when bucket verifier not set
  nfs: mirroring support for direct io
  nfs: add mirroring support to pgio layer
  pnfs: pass ds_commit_idx through the commit path
  ...

Conflicts:
	fs/nfs/pnfs.c
	fs/nfs/pnfs.h
2015-02-03 16:01:27 -05:00
Weston Andros Adamson
840210fc48 sunrpc: add rpc_count_iostats_idx
Add a call to tally stats for a task under a different statsidx than
what's contained in the task structure.

This is needed to properly account for pnfs reads/writes when the
DS nfs version != the MDS version.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com>
2015-02-03 11:06:38 -08:00
Trond Myklebust
cc3ea893cb NFS: Client side changes for RDMA
These patches improve the scalability of the NFSoRDMA client and take large
 variables off of the stack.  Additionally, the GFP_* flags are updated to
 match what TCP uses.
 
 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUy/21AAoJENfLVL+wpUDrDR4QAJ+aZ1LqO5muJG5LDIReh0p+
 IyvAO3ekWKaaXZd5e/wOviHRRSBCs3oRBm1sNyjOOZYf7nl67RrndjGq2Ccp6tGw
 6V6ligIlhu3QAApFiQmLfpDXQeDBysMfhEQMXDj8kVhAvn3SWwbT6q1Bb8tMZvnu
 lwSPYCuLAdtPzDs3j3Pt3jSrOD7gS06Vtumco1yS82YiLK86E7JiOLb9w5Ltk0Wo
 DT9hZ7JXtflBy3trBsNYQ7GTjoYEp792Ca9DT/nie8/Tuuy3DhooLJKdwVErq/0/
 ycI033mw//RneY+/eF2wLeS4PlzaiSq7DmXKpE5MnkpVuNAdPbHbCuVzuXa4/LLD
 NhKyw/NTAE9ucFJ8e3apj3m42O00bMV0rx0rOUvMMdbNPBKzXk3rrb4uRYfT4k+D
 yss7aEZg0EcDbe9KqfuDdwXVJIZnvAvXD+V1iL4X3QmVM0JyJZouNZPoG7fCwA1g
 uoqGJK9zu7osr59YMomf2okWQFUQYCea5ombXYOo3ZtANw+OYaHDcMKTXaiCtMdA
 CZUEOND8mB/EFtgTMA2TZmDzch8iOOuD7wmtifb8h1DqN/3GxgecBcT5XMCx5zDR
 bMpt9vbCfe4YP6idpl5XEnVbcPG9CZF2W2Hg2gI7axX8R+ZMvwQQlWJjOqxoR6+F
 bpD/jjWFYaS6U7NGyylM
 =YCZe
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-3.20' of git://git.linux-nfs.org/projects/anna/nfs-rdma

NFS: Client side changes for RDMA

These patches improve the scalability of the NFSoRDMA client and take large
variables off of the stack.  Additionally, the GFP_* flags are updated to
match what TCP uses.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

* tag 'nfs-rdma-for-3.20' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (21 commits)
  xprtrdma: Update the GFP flags used in xprt_rdma_allocate()
  xprtrdma: Clean up after adding regbuf management
  xprtrdma: Allocate zero pad separately from rpcrdma_buffer
  xprtrdma: Allocate RPC/RDMA receive buffer separately from struct rpcrdma_rep
  xprtrdma: Allocate RPC/RDMA send buffer separately from struct rpcrdma_req
  xprtrdma: Allocate RPC send buffer separately from struct rpcrdma_req
  xprtrdma: Add struct rpcrdma_regbuf and helpers
  xprtrdma: Refactor rpcrdma_buffer_create() and rpcrdma_buffer_destroy()
  xprtrdma: Simplify synopsis of rpcrdma_buffer_create()
  xprtrdma: Take struct ib_qp_attr and ib_qp_init_attr off the stack
  xprtrdma: Take struct ib_device_attr off the stack
  xprtrdma: Free the pd if ib_query_qp() fails
  xprtrdma: Remove rpcrdma_ep::rep_func and ::rep_xprt
  xprtrdma: Move credit update to RPC reply handler
  xprtrdma: Remove rl_mr field, and the mr_chunk union
  xprtrdma: Remove rpcrdma_ep::rep_ia
  xprtrdma: Rename "xprt" and "rdma_connect" fields in struct rpcrdma_xprt
  xprtrdma: Clean up hdrlen
  xprtrdma: Display XIDs in host byte order
  xprtrdma: Modernize htonl and ntohl
  ...
2015-02-03 11:54:58 -05:00
Mika Westerberg
79044f60ca net: rfkill: Add Broadcom BCM2E40 bluetooth ACPI ID
This is yet another Broadcom bluetooth chip with ACPI ID BCM2E40.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-03 14:38:22 +01:00
Johan Hedberg
88d9077c27 Bluetooth: Fix potential NULL dereference
The bnep_get_device function may be triggered by an ioctl just after a
connection has gone down. In such a case the respective L2CAP chan->conn
pointer will get set to NULL (by l2cap_chan_del). This patch adds a
missing NULL check for this case in the bnep_get_device() function.

Reported-by: Patrik Flykt <patrik.flykt@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-03 09:02:12 +01:00
David S. Miller
3ae55826ae Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Validate hooks for nf_tables NAT expressions, otherwise users can
   crash the kernel when using them from the wrong hook. We already
   got one user trapped on this when configuring masquerading.

2) Fix a BUG splat in nf_tables with CONFIG_DEBUG_PREEMPT=y. Reported
   by Andreas Schultz.

3) Avoid unnecessary reroute of traffic in the local input path
   in IPVS that triggers a crash in in xfrm. Reported by Florian
   Wiessner and fixes by Julian Anastasov.

4) Fix memory and module refcount leak from the error path of
   nf_tables_newchain().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:30:53 -08:00
Markus Elfring
7d37d0c159 net: sctp: Deletion of an unnecessary check before the function call "kfree"
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-By: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:29:43 -08:00
Vlad Yasevich
32dce968dd ipv6: Allow for partial checksums on non-ufo packets
Currntly, if we are not doing UFO on the packet, all UDP
packets will start with CHECKSUM_NONE and thus perform full
checksum computations in software even if device support
IPv6 checksum offloading.

Let's start start with CHECKSUM_PARTIAL if the device
supports it and we are sending only a single packet at
or below mtu size.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:28:05 -08:00
Vlad Yasevich
03485f2adc udpv6: Add lockless sendmsg() support
This commit adds the same functionaliy to IPv6 that
commit 903ab86d19
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Mar 1 02:36:48 2011 +0000

    udp: Add lockless transmit path

added to IPv4.

UDP transmit path can now run without a socket lock,
thus allowing multiple threads to send to a single socket
more efficiently.
This is only used when corking/MSG_MORE is not used.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:28:04 -08:00
Vlad Yasevich
d39d938c82 ipv6: Introduce udpv6_send_skb()
Now that we can individually construct IPv6 skbs to send, add a
udpv6_send_skb() function to populate the udp header and send the
skb.  This allows udp_v6_push_pending_frames() to re-use this
function as well as enables us to add lockless sendmsg() support.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:28:04 -08:00
Vlad Yasevich
6422398c2a ipv6: introduce ipv6_make_skb
This commit is very similar to
commit 1c32c5ad6f
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Mar 1 02:36:47 2011 +0000

    inet: Add ip_make_skb and ip_finish_skb

It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb.

The job of ip6_make_skb is to collect messages into an ipv6 packet
and poplulate ipv6 eader.  The job of ip6_finish_skb is to transmit
the generated skb.  Together they replicated the job of
ip6_push_pending_frames() while also provide the capability to be
called independently.  This will be needed to add lockless UDP sendmsg
support.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:28:04 -08:00
Vlad Yasevich
0bbe84a67b ipv6: Append sending data to arbitrary queue
Add the ability to append data to arbitrary queue.  This
will be needed later to implement lockless UDP sends.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:28:04 -08:00
Vlad Yasevich
366e41d977 ipv6: pull cork initialization into its own function.
Pull IPv6 cork initialization into its own function that
can be re-used.  IPv6 specific cork data did not have an
explicit data structure.  This patch creats eone so that
just ipv6 cork data can be as arguemts.  Also, since
IPv6 tries to save the flow label into inet_cork_full
tructure, pass the full cork.

Adjust ip6_cork_release() to take cork data structures.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:28:04 -08:00
Florian Westphal
843c2fdf7a net: dctcp: loosen requirement to assert ECT(0) during 3WHS
One deployment requirement of DCTCP is to be able to run
in a DC setting along with TCP traffic. As Glenn Judd's
NSDI'15 paper "Attaining the Promise and Avoiding the Pitfalls
of TCP in the Datacenter" [1] (tba) explains, one way to
solve this on switch side is to split DCTCP and TCP traffic
in two queues per switch port based on the DSCP: one queue
soley intended for DCTCP traffic and one for non-DCTCP traffic.

For the DCTCP queue, there's the marking threshold K as
explained in commit e3118e8359 ("net: tcp: add DCTCP congestion
control algorithm") for RED marking ECT(0) packets with CE.
For the non-DCTCP queue, there's f.e. a classic tail drop queue.
As already explained in e3118e8359, running DCTCP at scale
when not marking SYN/SYN-ACK packets with ECT(0) has severe
consequences as for non-ECT(0) packets, traversing the RED
marking DCTCP queue will result in a severe reduction of
connection probability.

This is due to the DCTCP queue being dominated by ECT(0) traffic
and switches handle non-ECT traffic in the RED marking queue
after passing K as drops, where K is usually a low watermark
in order to leave enough tailroom for bursts. Splitting DCTCP
traffic among several queues (ECN and non-ECN queue) is being
considered a terrible idea in the network community as it
splits single flows across multiple network paths.

Therefore, commit e3118e8359 implements this on Linux as
ECT(0) marked traffic, as we argue that marking all packets
of a DCTCP flow is the only viable solution and also doesn't
speak against the draft.

However, recently, a DCTCP implementation for FreeBSD hit also
their mainline kernel [2]. In order to let them play well
together with Linux' DCTCP, we would need to loosen the
requirement that ECT(0) has to be asserted during the 3WHS as
not implemented in FreeBSD. This simplifies the ECN test and
lets DCTCP work together with FreeBSD.

Joint work with Daniel Borkmann.

  [1] https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/judd
  [2] 8ad8794452

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 18:48:55 -08:00
Willem de Bruijn
b245be1f4d net-timestamp: no-payload only sysctl
Tx timestamps are looped onto the error queue on top of an skb. This
mechanism leaks packet headers to processes unless the no-payload
options SOF_TIMESTAMPING_OPT_TSONLY is set.

Add a sysctl that optionally drops looped timestamp with data. This
only affects processes without CAP_NET_RAW.

The policy is checked when timestamps are generated in the stack.
It is possible for timestamps with data to be reported after the
sysctl is set, if these were queued internally earlier.

No vulnerability is immediately known that exploits knowledge
gleaned from packet headers, but it may still be preferable to allow
administrators to lock down this path at the cost of possible
breakage of legacy applications.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  (v1 -> v2)
  - test socket CAP_NET_RAW instead of capable(CAP_NET_RAW)
  (rfc -> v1)
  - document the sysctl in Documentation/sysctl/net.txt
  - fix access control race: read .._OPT_TSONLY only once,
        use same value for permission check and skb generation.
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 18:46:51 -08:00
Willem de Bruijn
49ca0d8bfa net-timestamp: no-payload option
Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.

Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes (rfc -> v1)
  - add documentation
  - remove unnecessary skb->len test (thanks to Richard Cochran)
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 18:46:51 -08:00
Christophe Ricard
6095b0f07d NFC: nci: Change NCI state machine to LISTEN_ACTIVE
When receiving an interface activation notification, if
the RF interface is NCI_RF_INTERFACE_NFCEE_DIRECT, we
need to ignore the following parameters and change the NCI
state machine to NCI_LISTEN_ACTIVE. According to the NCI
specification, the parameters should be 0 and shall be
ignored.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:41 +01:00
Christophe Ricard
a41bb8448e NFC: nci: Add RF NFCEE action notification support
The NFCC sends an NCI_OP_RF_NFCEE_ACTION_NTF notification
to the host (DH) to let it know that for example an RF
transaction with a payment reader is done.
For now the notification handler is empty.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:41 +01:00
Christophe Ricard
447b27c4f2 NFC: Forward NFC_EVT_TRANSACTION to user space
NFC_EVT_TRANSACTION is sent through netlink in order for a
specific application running on a secure element to notify
userspace of an event. Typically the secure element application
counterpart on the host could interpret that event and act
upon it.

Forwarded information contains:
- SE host generating the event
- Application IDentifier doing the operation
- Applications parameters

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:40 +01:00
Christophe Ricard
11f54f2286 NFC: nci: Add HCI over NCI protocol support
According to the NCI specification, one can use HCI over NCI
to talk with specific NFCEE. The HCI network is viewed as one
logical NFCEE.
This is needed to support secure element running HCI only
firmwares embedded on an NCI capable chipset, like e.g. the
st21nfcb.
There is some duplication between this piece of code and the
HCI core code, but the latter would need to be abstracted even
more to be able to use NCI as a logical transport for HCP packets.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:40 +01:00
Christophe Ricard
736bb95774 NFC: nci: Support logical connections management
In order to communicate with an NFCEE, we need to open a logical
connection to it, by sending the NCI_OP_CORE_CONN_CREATE_CMD
command to the NFCC. It's left up to the drivers to decide when
to close an already opened logical connection.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:39 +01:00
Christophe Ricard
f7f793f313 NFC: nci: Add NFCEE enabling and disabling support
NFCEEs can be enabled or disabled by sending the
NCI_OP_NFCEE_MODE_SET_CMD command to the NFCC. This patch
provides an API for drivers to enable and disable e.g. their
NCI discoveredd secure elements.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:39 +01:00
Christophe Ricard
af9c8aa67d NFC: nci: Add NFCEE discover support
NFCEEs (NFC Execution Environment) have to be explicitly
discovered by sending the NCI_OP_NFCEE_DISCOVER_CMD
command. The NFCC will respond to this command by telling
us how many NFCEEs are connected to it. Then the NFCC sends
a notification command for each and every NFCEE connected.
Here we implement support for sending
NCI_OP_NFCEE_DISCOVER_CMD command, receiving the response
and the potential notifications.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:38 +01:00
Christophe Ricard
4aeee6871e NFC: nci: Add dynamic logical connections support
The current NCI core only support the RF static connection.
For other NFC features such as Secure Element communication, we
may need to create logical connections to the NFCEE (Execution
Environment.

In order to track each logical connection ID dynamically, we add a
linked list of connection info pointers to the nci_dev structure.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-02-02 21:50:31 +01:00
Johan Hedberg
66f096f791 Bluetooth: Remove mgmt_rp_read_local_oob_ext_data struct
This extended return parameters struct conflicts with the new Read Local
OOB Extended Data command definition. To avoid the conflict simply
rename the old "extended" version to the normal one and update the code
appropriately to take into account the two possible response PDU sizes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-02 18:27:56 +01:00
J. Bruce Fields
a584143b01 Merge branch 'locks-3.20' of git://git.samba.org/jlayton/linux into for-3.20
Christoph's block pnfs patches have some minor dependencies on these
lock patches.
2015-02-02 11:29:29 -05:00
Jakub Pawlowski
4b0e0ceddf Bluetooth: Add restarting to service discovery
When using LE_SCAN_FILTER_DUP_ENABLE, some controllers would send
advertising report from each LE device only once. That means that we
don't get any updates on RSSI value, and makes Service Discovery very
slow. This patch adds restarting scan when in Service Discovery, and
device with filtered uuid is found, but it's not in RSSI range to send
event yet. This way if device moves into range, we will quickly get RSSI
update.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-02 08:52:34 +01:00
Jakub Pawlowski
2d28cfe7aa Bluetooth: Add le_scan_restart work for LE scan restarting
Currently there is no way to restart le scan, and it's needed in
service scan method. The way it work: it disable, and then enable le
scan on controller.

During the restart, we must remember when the scan was started, and
it's duration, to later re-schedule the le_scan_disable work, that was
stopped during the stop scan phase.

Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-02 08:52:33 +01:00
Roopa Prabhu
68e331c785 bridge: offload bridge port attributes to switch asic if feature flag set
This patch adds support to set/del bridge port attributes in hardware from
the bridge driver.

With this, when the user sends a bridge setlink message with no flags or
master flags set,
   - the bridge driver ndo_bridge_setlink handler sets settings in the kernel
   - calls the swicthdev api to propagate the attrs to the switchdev
	hardware

   You can still use the self flag to go to the switch hw or switch port
   driver directly.

With this, it also makes sure a notification goes out only after the
attributes are set both in the kernel and hw.

The patch calls switchdev api only if BRIDGE_FLAGS_SELF is not set.
This is because the offload cases with BRIDGE_FLAGS_SELF are handled in
the caller (in rtnetlink.c).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01 23:16:34 -08:00
Roopa Prabhu
8a44dbb202 swdevice: add new apis to set and del bridge port attributes
This patch adds two new api's netdev_switch_port_bridge_setlink
and netdev_switch_port_bridge_dellink to offload bridge port attributes
to switch port

(The names of the apis look odd with 'switch_port_bridge',
but am more inclined to change the prefix of the api to something else.
Will take any suggestions).

The api's look at the NETIF_F_HW_SWITCH_OFFLOAD feature flag to
pass bridge port attributes to the port device.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01 23:16:34 -08:00
Roopa Prabhu
add511b382 bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellink
bridge flags are needed inside ndo_bridge_setlink/dellink handlers to
avoid another call to parse IFLA_AF_SPEC inside these handlers

This is used later in this series

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01 23:16:33 -08:00
Eric Dumazet
bdbbb8527b ipv4: tcp: get rid of ugly unicast_sock
In commit be9f4a44e7 ("ipv4: tcp: remove per net tcp_sock")
I tried to address contention on a socket lock, but the solution
I chose was horrible :

commit 3a7c384ffd ("ipv4: tcp: unicast_sock should not land outside
of TCP stack") addressed a selinux regression.

commit 0980e56e50 ("ipv4: tcp: set unicast_sock uc_ttl to -1")
took care of another regression.

commit b5ec8eeac4 ("ipv4: fix ip_send_skb()") fixed another regression.

commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
was another shot in the dark.

Really, just use a proper socket per cpu, and remove the skb_orphan()
call, to re-enable flow control.

This solves a serious problem with FQ packet scheduler when used in
hostile environments, as we do not want to allocate a flow structure
for every RST packet sent in response to a spoofed packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01 23:06:19 -08:00
Marcel Holtmann
bf21d7931a Bluetooth: Fix OOB data present for BR/EDR Secure Connections Only mode
When using Secure Connections Only mode, then only P-256 OOB data is
valid and should be provided. In case userspace provides P-192 and P-256
OOB data, then the P-192 values will be set to zero. However the present
value of the IO capability exchange still mentioned that both values
would be available. Fix this by telling the controller clearly that only
the P-256 OOB data is present.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-02-01 11:52:54 +02:00
Marcel Holtmann
6858bcd073 Bluetooth: Expose remote OOB information as debugfs entry
For debugging purposes it is good to know which OOB data is actually
currently loaded for each controller. So expose that list via debugfs.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-02-01 09:15:21 +02:00
Marcel Holtmann
5789f37cbc Bluetooth: Expose hardware error code as debugfs entry
When the Hardware Error event is send by the controller, the Bluetooth
core stores the error code. Expose it via debugfs so it can be retrieved
later on.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-02-01 09:14:55 +02:00
Marcel Holtmann
0886aea6ac Bluetooth: Expose debug keys usage setting via debugfs
To allow easier debugging when debug keys are generated, provide debugfs
entry for checking the setting of debug keys usage.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-02-01 09:14:19 +02:00
Marcel Holtmann
c50b33c80e Bluetooth: Track changes from HCI Write Simple Pairing Debug Mode command
When the HCI Write Simple Pairing Debug Mode command has been issued,
the result needs to be tracked and stored. The hdev->ssp_debug_mode
variable is already present, but was never updated when the mode in
the controller was actually changed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-02-01 09:13:23 +02:00
Marcel Holtmann
6e07231a80 Bluetooth: Expose Secure Simple Pairing debug mode setting in debugfs
The value of the ssp_debug_mode should be accessible via debugfs to be
able to determine if a BR/EDR controller generates debugs keys or not.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-02-01 09:12:56 +02:00
Eric Dumazet
349c9e3c73 ipv4: icmp: use percpu allocation
Get rid of nr_cpu_ids and use modern percpu allocation.

Note that the sockets themselves are not yet allocated
using NUMA affinity.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-31 17:48:18 -08:00
Kenneth Klette Jonassen
932eb7638a tcp: use SACK RTTs for CC
Current behavior only passes RTTs from sequentially acked data to CC.

If sender gets a combined ACK for segment 1 and SACK for segment 3, then the
computed RTT for CC is the time between sending segment 1 and receiving SACK
for segment 3.

Pass the minimum computed RTT from any acked data to CC, i.e. time between
sending segment 3 and receiving SACK for segment 3.

Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-31 17:25:37 -08:00
Marcel Holtmann
41bcfd50d5 Bluetooth: Allow remote OOB data to only provide P-192 or P-256 values
In case the remote only provided P-192 or P-256 data for OOB pairing,
then make sure that the data value pointers are correctly set. That way
the core can provide correct information when remote OOB data present
information have to be communicated.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-31 21:26:14 +01:00
Marcel Holtmann
4775a4ea14 Bluetooth: Fix OOB data present value for SMP pairing
Before setting the OOB data present flag with SMP pairing, check the
newly introduced present tracking that actual OOB data values have
been provided. The existence of remote OOB data structure does not
actually mean that the correct data values are available.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-31 21:26:14 +01:00
Marcel Holtmann
659c7fb084 Bluetooth: Fix OOB data present value for BR/EDR Secure Connections
When BR/EDR Secure Connections has been enabled, the OOB data present
value can take 2 additional values. The host has to clearly provide
details about if P-192 OOB data, P-256 OOB data or a combination of
P-192 and P-256 OOB data is present.

In case BR/EDR Secure Connections is not enabled or not supported,
then check that P-192 OOB data is actually present and return the
correct value based on that.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-31 21:26:12 +01:00
Marcel Holtmann
f7697b1602 Bluetooth: Store OOB data present value for each set of remote OOB data
Instead of doing complex calculation every time the OOB data is used,
just calculate the OOB data present value and store it with the OOB
data raw values.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-31 09:59:45 +02:00
Nicholas Mc Guire
a994a09097 irda: use msecs_to_jiffies for conversions
This is only an API consolidation and should make things more readable
it replaces  var * HZ / 1000  constructs by  msecs_to_jiffies(var).

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 18:08:25 -08:00
Toshiaki Makita
d4bcef3fbe net: Fix vlan_get_protocol for stacked vlan
vlan_get_protocol() could not get network protocol if a skb has a 802.1ad
vlan tag or multiple vlans, which caused incorrect checksum calculation
in several drivers.

Fix vlan_get_protocol() to retrieve network protocol instead of incorrect
vlan protocol.

As the logic is the same as skb_network_protocol(), create a common helper
function __vlan_get_protocol() and call it from existing functions.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 18:03:47 -08:00
Daniel Borkmann
207895fd38 net: mark some potential candidates __read_mostly
They are all either written once or extremly rarely (e.g. from init
code), so we can move them to the .data..read_mostly section.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 17:58:39 -08:00
Saran Maruti Ramanara
cfbf654efc net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param
When making use of RFC5061, section 4.2.4. for setting the primary IP
address, we're passing a wrong parameter header to param_type2af(),
resulting always in NULL being returned.

At this point, param.p points to a sctp_addip_param struct, containing
a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation
id. Followed by that, as also presented in RFC5061 section 4.2.4., comes
the actual sctp_addr_param, which also contains a sctp_paramhdr, but
this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that
param_type2af() can make use of. Since we already hold a pointer to
addr_param from previous line, just reuse it for param_type2af().

Fixes: d6de309759 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Saran Maruti Ramanara <saran.neti@telus.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 17:45:23 -08:00
Pablo Neira
8b7c36d810 netlink: fix wrong subscription bitmask to group mapping in
The subscription bitmask passed via struct sockaddr_nl is converted to
the group number when calling the netlink_bind() and netlink_unbind()
callbacks.

The conversion is however incorrect since bitmask (1 << 0) needs to be
mapped to group number 1. Note that you cannot specify the group number 0
(usually known as _NONE) from setsockopt() using NETLINK_ADD_MEMBERSHIP
since this is rejected through -EINVAL.

This problem became noticeable since 97840cb ("netfilter: nfnetlink:
fix insufficient validation in nfnetlink_bind") when binding to bitmask
(1 << 0) in ctnetlink.

Reported-by: Andre Tomt <andre@tomt.net>
Reported-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 17:43:47 -08:00
Patrick McHardy
4c1017aa80 netfilter: nft_lookup: add missing attribute validation for NFTA_LOOKUP_SET_ID
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-30 19:08:20 +01:00
Arturo Borrero
5191f4d82d netfilter: nft_compat: add ebtables support
This patch extends nft_compat to support ebtables extensions.

ebtables verdict codes are translated to the ones used by the nf_tables engine,
so we can properly use ebtables target extensions from nft_compat.

This patch extends previous work by Giuseppe Longo <giuseppelng@gmail.com>.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-30 19:07:59 +01:00
Pablo Neira Ayuso
f5553c19ff netfilter: nf_tables: fix leaks in error path of nf_tables_newchain()
Release statistics and module refcount on memory allocation problems.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-30 18:42:08 +01:00
Chuck Lever
a0a1d50cd1 xprtrdma: Update the GFP flags used in xprt_rdma_allocate()
Reflect the more conservative approach used in the socket transport's
version of this transport method. An RPC buffer allocation should
avoid forcing not just FS activity, but any I/O.

In particular, two recent changes missed updating xprtrdma:

 - Commit c6c8fe79a8 ("net, sunrpc: suppress allocation warning ...")
 - Commit a564b8f039 ("nfs: enable swap on NFS")

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 12:18:48 -05:00
Chuck Lever
df515ca7b3 xprtrdma: Clean up after adding regbuf management
rpcrdma_{de}register_internal() are used only in verbs.c now.

MAX_RPCRDMAHDR is no longer used and can be removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:49 -05:00
Chuck Lever
c05fbb5a59 xprtrdma: Allocate zero pad separately from rpcrdma_buffer
Use the new rpcrdma_alloc_regbuf() API to shrink the amount of
contiguous memory needed for a buffer pool by moving the zero
pad buffer into a regbuf.

This is for consistency with the other uses of internally
registered memory.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:49 -05:00
Chuck Lever
6b1184cd4f xprtrdma: Allocate RPC/RDMA receive buffer separately from struct rpcrdma_rep
The rr_base field is currently the buffer where RPC replies land.

An RPC/RDMA reply header lands in this buffer. In some cases an RPC
reply header also lands in this buffer, just after the RPC/RDMA
header.

The inline threshold is an agreed-on size limit for RDMA SEND
operations that pass from server and client. The sum of the
RPC/RDMA reply header size and the RPC reply header size must be
less than this threshold.

The largest RDMA RECV that the client should have to handle is the
size of the inline threshold. The receive buffer should thus be the
size of the inline threshold, and not related to RPCRDMA_MAX_SEGS.

RPC replies received via RDMA WRITE (long replies) are caught in
rq_rcv_buf, which is the second half of the RPC send buffer. Ie,
such replies are not involved in any way with rr_base.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:49 -05:00
Chuck Lever
85275c874e xprtrdma: Allocate RPC/RDMA send buffer separately from struct rpcrdma_req
The rl_base field is currently the buffer where each RPC/RDMA call
header is built.

The inline threshold is an agreed-on size limit to for RDMA SEND
operations that pass between client and server. The sum of the
RPC/RDMA header size and the RPC header size must be less than or
equal to this threshold.

Increasing the r/wsize maximum will require MAX_SEGS to grow
significantly, but the inline threshold size won't change (both
sides agree on it). The server's inline threshold doesn't change.

Since an RPC/RDMA header can never be larger than the inline
threshold, make all RPC/RDMA header buffers the size of the
inline threshold.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:49 -05:00
Chuck Lever
0ca77dc372 xprtrdma: Allocate RPC send buffer separately from struct rpcrdma_req
Because internal memory registration is an expensive and synchronous
operation, xprtrdma pre-registers send and receive buffers at mount
time, and then re-uses them for each RPC.

A "hardway" allocation is a memory allocation and registration that
replaces a send buffer during the processing of an RPC. Hardway must
be done if the RPC send buffer is too small to accommodate an RPC's
call and reply headers.

For xprtrdma, each RPC send buffer is currently part of struct
rpcrdma_req so that xprt_rdma_free(), which is passed nothing but
the address of an RPC send buffer, can find its matching struct
rpcrdma_req and rpcrdma_rep quickly via container_of / offsetof.

That means that hardway currently has to replace a whole rpcrmda_req
when it replaces an RPC send buffer. This is often a fairly hefty
chunk of contiguous memory due to the size of the rl_segments array
and the fact that both the send and receive buffers are part of
struct rpcrdma_req.

Some obscure re-use of fields in rpcrdma_req is done so that
xprt_rdma_free() can detect replaced rpcrdma_req structs, and
restore the original.

This commit breaks apart the RPC send buffer and struct rpcrdma_req
so that increasing the size of the rl_segments array does not change
the alignment of each RPC send buffer. (Increasing rl_segments is
needed to bump up the maximum r/wsize for NFS/RDMA).

This change opens up some interesting possibilities for improving
the design of xprt_rdma_allocate().

xprt_rdma_allocate() is now the one place where RPC send buffers
are allocated or re-allocated, and they are now always left in place
by xprt_rdma_free().

A large re-allocation that includes both the rl_segments array and
the RPC send buffer is no longer needed. Send buffer re-allocation
becomes quite rare. Good send buffer alignment is guaranteed no
matter what the size of the rl_segments array is.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:49 -05:00
Chuck Lever
9128c3e794 xprtrdma: Add struct rpcrdma_regbuf and helpers
There are several spots that allocate a buffer via kmalloc (usually
contiguously with another data structure) and then register that
buffer internally. I'd like to split the buffers out of these data
structures to allow the data structures to scale.

Start by adding functions that can kmalloc and register a buffer,
and can manage/preserve the buffer's associated ib_sge and ib_mr
fields.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:49 -05:00
Chuck Lever
1392402c40 xprtrdma: Refactor rpcrdma_buffer_create() and rpcrdma_buffer_destroy()
Move the details of how to create and destroy rpcrdma_req and
rpcrdma_rep structures into helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
ac920d04a7 xprtrdma: Simplify synopsis of rpcrdma_buffer_create()
Clean up: There is one call site for rpcrdma_buffer_create(). All of
the arguments there are fields of an rpcrdma_xprt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
ce1ab9ab47 xprtrdma: Take struct ib_qp_attr and ib_qp_init_attr off the stack
Reduce stack footprint of the connection upcall handler function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
7bc7972cdd xprtrdma: Take struct ib_device_attr off the stack
Device attributes are large, and are used in more than one place.
Stash a copy in dynamically allocated memory.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
5ae711a246 xprtrdma: Free the pd if ib_query_qp() fails
If ib_query_qp() fails or the memory registration mode isn't
supported, don't leak the PD. An orphaned IB/core resource will
cause IB module removal to hang.

Fixes: bd7ed1d133 ("RPC/RDMA: check selected memory registration ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
afadc468eb xprtrdma: Remove rpcrdma_ep::rep_func and ::rep_xprt
Clean up: The rep_func field always refers to rpcrdma_conn_func().
rep_func should have been removed by commit b45ccfd25d ("xprtrdma:
Remove MEMWINDOWS registration modes").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
eba8ff660b xprtrdma: Move credit update to RPC reply handler
Reduce work in the receive CQ handler, which can be run at hardware
interrupt level, by moving the RPC/RDMA credit update logic to the
RPC reply handler.

This has some additional benefits: More header sanity checking is
done before trusting the incoming credit value, and the receive CQ
handler no longer touches the RPC/RDMA header (the CPU stalls while
waiting for the header contents to be brought into the cache).

This further extends work begun by commit e7ce710a88 ("xprtrdma:
Avoid deadlock when credit window is reset").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
3eb3581066 xprtrdma: Remove rl_mr field, and the mr_chunk union
Clean up: Since commit 0ac531c183 ("xprtrdma: Remove REGISTER
memory registration mode"), the rl_mr pointer is no longer used
anywhere.

After removal, there's only a single member of the mr_chunk union,
so mr_chunk can be removed as well, in favor of a single pointer
field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
5d410ba061 xprtrdma: Remove rpcrdma_ep::rep_ia
Clean up: This field is not used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
5abefb861f xprtrdma: Rename "xprt" and "rdma_connect" fields in struct rpcrdma_xprt
Clean up: Use consistent field names in struct rpcrdma_xprt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
f2846481b4 xprtrdma: Clean up hdrlen
Clean up: Replace naked integers with a documenting macro.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
052151a979 xprtrdma: Display XIDs in host byte order
xprtsock.c and the backchannel code display XIDs in host byte order.
Follow suit in xprtrdma.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
284f4902a6 xprtrdma: Modernize htonl and ntohl
Clean up: Replace htonl and ntohl with the be32 equivalents.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:48 -05:00
Chuck Lever
8502427ccd xprtrdma: human-readable completion status
Make it easier to grep the system log for specific error conditions.

The wc.opcode field is not included because opcode numbers are
sparse, and because wc.opcode is not necessarily valid when
completion reports an error.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-01-30 10:47:47 -05:00
Julian Anastasov
579eb62ac3 ipvs: rerouting to local clients is not needed anymore
commit f5a41847ac ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.

Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"

Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-01-30 10:05:55 +09:00
Li Wei
3cdaa5be9e ipv4: Don't increase PMTU with Datagram Too Big message.
RFC 1191 said, "a host MUST not increase its estimate of the Path
MTU in response to the contents of a Datagram Too Big message."

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-29 15:28:59 -08:00
Salam Noureddine
7866a62104 dev: add per net_device packet type chains
When many pf_packet listeners are created on a lot of interfaces the
current implementation using global packet type lists scales poorly.
This patch adds per net_device packet type lists to fix this problem.

The patch was originally written by Eric Biederman for linux-2.6.29.
Tested on linux-3.16.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-29 14:41:39 -08:00
Nicolas Dichtel
7b4ce694b2 rtnetlink: pass link_net to the newlink handler
When IFLA_LINK_NETNSID is used, the netdevice should be built in this link netns
and moved at the end to another netns (pointed by the socket netns or
IFLA_NET_NS_[PID|FD]).

Existing user of the newlink handler will use the netns argument (src_net) to
find a link netdevice or to check some other information into the link netns.
For example, to find a netdevice, two information are required: an ifindex
(usually from IFLA_LINK) and a netns (this link netns).

Note: when using IFLA_LINK_NETNSID and IFLA_NET_NS_[PID|FD], a user may create a
netdevice that stands in netnsX and with its link part in netnsY, by sending a
rtnl message from netnsZ.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-29 14:23:25 -08:00
Nicolas Dichtel
8997c27ec4 caif: remove wrong dev_net_set() call
src_net points to the netns where the netlink message has been received. This
netns may be different from the netns where the interface is created (because
the user may add IFLA_NET_NS_[PID|FD]). In this case, src_net is the link netns.

It seems wrong to override the netns in the newlink() handler because if it
was not already src_net, it means that the user explicitly asks to create the
netdevice in another netns.

CC: Sjur Brændeland <sjur.brandeland@stericsson.com>
CC: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Fixes: 8391c4aab1 ("caif: Bugfixes in CAIF netdevice for close and flow control")
Fixes: c412540063 ("caif-hsi: Add rtnl support")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-29 14:20:02 -08:00
Szymon Janc
ac363cf9eb Bluetooth: Fix sending Read Remote Extended Features command
This command should only be used if remote device reports that it
supports extended features. Otherwise command will fail and connection
will be dropped.

Some devices support SSP but don't support extended features so
current check for SSP support is not enought.

Instead of checking for SSP support just check if both ends support
Extended Feature.

< HCI Command: Create Connection (0x01|0x0005) plen 13
        Address: D0:9C:30:00:19:6F (Foster Electric Company, Limited)
        Packet type: 0xcc18
          DM1 may be used
          DH1 may be used
          DM3 may be used
          DH3 may be used
          DM5 may be used
          DH5 may be used
        Page scan repetition mode: R1 (0x01)
        Page scan mode: Mandatory (0x00)
        Clock offset: 0x94c8
        Role switch: Allow slave (0x01)
> HCI Event: Command Status (0x0f) plen 4
      Create Connection (0x01|0x0005) ncmd 1
        Status: Success (0x00)
> HCI Event: Connect Complete (0x03) plen 11
        Status: Success (0x00)
        Handle: 5
        Address: D0:9C:30:00:19:6F (Foster Electric Company, Limited)
        Link type: ACL (0x01)
        Encryption: Disabled (0x00)
< HCI Command: Read Remote Supported Features (0x01|0x001b) plen 2
        Handle: 5
> HCI Event: Command Status (0x0f) plen 4
      Read Remote Supported Features (0x01|0x001b) ncmd 1
        Status: Success (0x00)
> HCI Event: Page Scan Repetition Mode Change (0x20) plen 7
        Address: D0:9C:30:00:19:6F (Foster Electric Company, Limited)
        Page scan repetition mode: R1 (0x01)
> HCI Event: Read Remote Supported Features (0x0b) plen 11
        Status: Success (0x00)
        Handle: 5
        Features: 0xff 0xff 0x8f 0xfe 0xdb 0xff 0x5b 0x07
          3 slot packets
          5 slot packets
          Encryption
          Slot offset
          Timing accuracy
          Role switch
          Hold mode
          Sniff mode
          Park state
          Power control requests
          Channel quality driven data rate (CQDDR)
          SCO link
          HV2 packets
          HV3 packets
          u-law log synchronous data
          A-law log synchronous data
          CVSD synchronous data
          Paging parameter negotiation
          Power control
          Transparent synchronous data
          Broadcast Encryption
          Enhanced Data Rate ACL 2 Mbps mode
          Enhanced Data Rate ACL 3 Mbps mode
          Enhanced inquiry scan
          Interlaced inquiry scan
          Interlaced page scan
          RSSI with inquiry results
          Extended SCO link (EV3 packets)
          EV4 packets
          EV5 packets
          AFH capable slave
          AFH classification slave
          LE Supported (Controller)
          3-slot Enhanced Data Rate ACL packets
          5-slot Enhanced Data Rate ACL packets
          Sniff subrating
          Pause encryption
          AFH capable master
          AFH classification master
          Enhanced Data Rate eSCO 2 Mbps mode
          Enhanced Data Rate eSCO 3 Mbps mode
          3-slot Enhanced Data Rate eSCO packets
          Extended Inquiry Response
          Simultaneous LE and BR/EDR (Controller)
          Secure Simple Pairing
          Encapsulated PDU
          Non-flushable Packet Boundary Flag
          Link Supervision Timeout Changed Event
          Inquiry TX Power Level
          Enhanced Power Control
< HCI Command: Read Remote Extended Features (0x01|0x001c) plen 3
        Handle: 5
        Page: 1
> HCI Event: Command Status (0x0f) plen 4
      Read Remote Extended Features (0x01|0x001c) ncmd 1
        Status: Command Disallowed (0x0c)
< HCI Command: Read Clock Offset (0x01|0x001f) plen 2
        Handle: 5
> HCI Event: Command Status (0x0f) plen 4
      Read Clock Offset (0x01|0x001f) ncmd 1
        Status: Success (0x00)
< HCI Command: Disconnect (0x01|0x0006) plen 3
        Handle: 5
        Reason: Remote User Terminated Connection (0x13)

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-29 16:59:53 +01:00
Eric Dumazet
811230cd85 tcp: ipv4: initialize unicast_sock sk_pacing_rate
When I added sk_pacing_rate field, I forgot to initialize its value
in the per cpu unicast_sock used in ip_send_unicast_reply()

This means that for sch_fq users, RST packets, or ACK packets sent
on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
once we reach the per flow limit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 95bd09eb27 ("tcp: TSO packets automatic sizing")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 23:24:47 -08:00
Eric Dumazet
86b3bfe914 pkt_sched: fq: remove useless TIME_WAIT check
TIME_WAIT sockets are not owning any skb.

ip_send_unicast_reply() and tcp_v6_send_response() both use
regular sockets.

We can safely remove a test in sch_fq and save one cache line miss,
as sk_state is far away from sk_pacing_rate.

Tested at Google for about one year.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 23:23:57 -08:00
Arnd Bergmann
2dbce096ca act_connmark: fix dependencies better
NET_ACT_CONNMARK fails to build if NF_CONNTRACK_MARK is disabled,
and d7924450e1 ("act_connmark: Add missing dependency on
NF_CONNTRACK_MARK") fixed that case, but missed the cased where
NF_CONNTRACK is a loadable module.

This adds the second dependency to ensure that NET_ACT_CONNMARK
can only be built-in if NF_CONNTRACK is also part of the kernel
rather than a loadable module.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 23:23:06 -08:00
Christoph Hellwig
7cc0566268 net: remove sock_iocb
The sock_iocb structure is allocate on stack for each read/write-like
operation on sockets, and contains various fields of which only the
embedded msghdr and sometimes a pointer to the scm_cookie is ever used.
Get rid of the sock_iocb and put a msghdr directly on the stack and pass
the scm_cookie explicitly to netlink_mmap_sendmsg.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 23:15:07 -08:00
Jesse Gross
b8693877ae openvswitch: Add support for checksums on UDP tunnels.
Currently, it isn't possible to request checksums on the outer UDP
header of tunnels - the TUNNEL_CSUM flag is ignored. This adds
support for requesting that UDP checksums be computed on transmit
and properly reported if they are present on receive.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 23:04:15 -08:00
David S. Miller
b8f8be3f04 NFC: 3.20 first pull request
This is the first NFC pull request for 3.20.
 
 With this one we have:
 
 - Secure element support for the ST Micro st21nfca driver. This depends
   on a few HCI internal changes in order for example to support more
   than one secure element per controller.
 
 - ACPI support for NXP's pn544 HCI driver. This controller is found on
   many x86 SoCs and is typically enumerated on the ACPI bus there.
 
 - A few st21nfca and st21nfcb fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUyXy7AAoJEIqAPN1PVmxKOY4P/1WJtEZfzBOFMlh9qeA8YESv
 cS1xbZuAVeEJ3r/sgDc87iA/4MMNZDzfhDHQlQJs/pJWAXE3Am1/dZGWHJC6pbk/
 roTlbEh5OvQU8cRIdAvOcEgrBIWk+E30Mkd2OtOMpWyhbgChN7hKz0KUs7znVHBJ
 G3YCcZOPr7K2ra78UlYzApvGqoxiVaiEWQyj6rzx2HbVzxzICL6A5m9cRcZuGxYR
 sK3Y/DpKJKGwH3p1kkLIOqHy3nGhq2LttVSXF/f5xkOB8teERSkt4i8aJBiSb9ym
 a++iAWp2syH3sGh2rsb3G4KRYCYq2J9mJD7oBB3G/6UoNwyeSgW3GdxgH/yxt6C9
 KfzHm9T8jfvQIBlf4lGeboV8zu+ysQehJFNKtxdQDlvwqPiR14clT5JFejocr9+N
 SegCbF+LJaZW8boOZVvhxTxASpEZ0RTFwUIkKKxtXOpK4ha0s1gEAtuZUpTYmRtJ
 wGqds8wszkE0wOdZJi7dsCGpp5JNI0LZZCsuKa6Ko3E7LkTAor8Bbmob0RR51ClD
 srtoz6kJgoozAMRLMISXBimk0geOp38iGs26GPSpRHwSV/1loN6+abaOMYlGbDCq
 +oVpqMmJsS/z5US5+wEkWU+O4y9TS0O74d/TxxcwUM+CLNyKI+Ms1rMOyYi0domo
 2xa7MuDCrmztQbs6ROKT
 =SNWp
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

NFC: 3.20 first pull request

This is the first NFC pull request for 3.20.

With this one we have:

- Secure element support for the ST Micro st21nfca driver. This depends
  on a few HCI internal changes in order for example to support more
  than one secure element per controller.

- ACPI support for NXP's pn544 HCI driver. This controller is found on
  many x86 SoCs and is typically enumerated on the ACPI bus there.

- A few st21nfca and st21nfcb fixes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:49:55 -08:00
Roopa Prabhu
59ccaaaa49 bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify
Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081

This patch avoids calling rtnl_notify if the device ndo_bridge_getlink
handler does not return any bytes in the skb.

Alternately, the skb->len check can be moved inside rtnl_notify.

For the bridge vlan case described in 92081, there is also a fix needed
in bridge driver to generate a proper notification. Will fix that in
subsequent patch.

v2: rebase patch on net tree

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:21:31 -08:00
Neal Cardwell
d6b1a8a92a tcp: fix timing issue in CUBIC slope calculation
This patch fixes a bug in CUBIC that causes cwnd to increase slightly
too slowly when multiple ACKs arrive in the same jiffy.

If cwnd is supposed to increase at a rate of more than once per jiffy,
then CUBIC was sometimes too slow. Because the bic_target is
calculated for a future point in time, calculated with time in
jiffies, the cwnd can increase over the course of the jiffy while the
bic_target calculated as the proper CUBIC cwnd at time
t=tcp_time_stamp+rtt does not increase, because tcp_time_stamp only
increases on jiffy tick boundaries.

So since the cnt is set to:
	ca->cnt = cwnd / (bic_target - cwnd);
as cwnd increases but bic_target does not increase due to jiffy
granularity, the cnt becomes too large, causing cwnd to increase
too slowly.

For example:
- suppose at the beginning of a jiffy, cwnd=40, bic_target=44
- so CUBIC sets:
   ca->cnt =  cwnd / (bic_target - cwnd) = 40 / (44 - 40) = 40/4 = 10
- suppose we get 10 acks, each for 1 segment, so tcp_cong_avoid_ai()
   increases cwnd to 41
- so CUBIC sets:
   ca->cnt =  cwnd / (bic_target - cwnd) = 41 / (44 - 41) = 41 / 3 = 13

So now CUBIC will wait for 13 packets to be ACKed before increasing
cwnd to 42, insted of 10 as it should.

The fix is to avoid adjusting the slope (determined by ca->cnt)
multiple times within a jiffy, and instead skip to compute the Reno
cwnd, the "TCP friendliness" code path.

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:38 -08:00
Neal Cardwell
9cd981dcf1 tcp: fix stretch ACK bugs in CUBIC
Change CUBIC to properly handle stretch ACKs in additive increase mode
by passing in the count of ACKed packets to tcp_cong_avoid_ai().

In addition, because we are now precisely accounting for stretch ACKs,
including delayed ACKs, we can now remove the delayed ACK tracking and
estimation code that tracked recent delayed ACK behavior in
ca->delayed_ack.

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:38 -08:00
Neal Cardwell
c22bdca947 tcp: fix stretch ACK bugs in Reno
Change Reno to properly handle stretch ACKs in additive increase mode
by passing in the count of ACKed packets to tcp_cong_avoid_ai().

In addition, if snd_cwnd crosses snd_ssthresh during slow start
processing, and we then exit slow start mode, we need to carry over
any remaining "credit" for packets ACKed and apply that to additive
increase by passing this remaining "acked" count to
tcp_cong_avoid_ai().

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:38 -08:00
Neal Cardwell
814d488c61 tcp: fix the timid additive increase on stretch ACKs
tcp_cong_avoid_ai() was too timid (snd_cwnd increased too slowly) on
"stretch ACKs" -- cases where the receiver ACKed more than 1 packet in
a single ACK. For example, suppose w is 10 and we get a stretch ACK
for 20 packets, so acked is 20. We ought to increase snd_cwnd by 2
(since acked/w = 20/10 = 2), but instead we were only increasing cwnd
by 1. This patch fixes that behavior.

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:37 -08:00
Neal Cardwell
e73ebb0881 tcp: stretch ACK fixes prep
LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that
cover more than the RFC-specified maximum of 2 packets. These stretch
ACKs can cause serious performance shortfalls in common congestion
control algorithms that were designed and tuned years ago with
receiver hosts that were not using LRO or GRO, and were instead
politely ACKing every other packet.

This patch series fixes Reno and CUBIC to handle stretch ACKs.

This patch prepares for the upcoming stretch ACK bug fix patches. It
adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future
fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and
changes all congestion control algorithms to pass in 1 for the ACKed
count. It also changes tcp_slow_start() to return the number of packet
ACK "credits" that were not processed in slow start mode, and can be
processed by the congestion control module in additive increase mode.

In future patches we will fix tcp_cong_avoid_ai() to handle stretch
ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start
and additive increase mode.

Reported-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:18:37 -08:00
Marcel Holtmann
64dae967ca Bluetooth: Move smp_unregister() into hci_dev_do_close() function
The smp_unregister() function needs to be called every time the
controller is powered down. There are multiple entry points when
this can happen. One is "hciconfig hci0 reset" which will throw
a WARN_ON when LE support has been enabled.

[   78.564620] WARNING: CPU: 0 PID: 148 at net/bluetooth/smp.c:3075 smp_register+0xf1/0x170()
[   78.564622] Modules linked in:
[   78.564628] CPU: 0 PID: 148 Comm: kworker/u3:1 Not tainted 3.19.0-rc4-devel+ #404
[   78.564629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
[   78.564635] Workqueue: hci0 hci_rx_work
[   78.564638]  ffffffff81b4a7a2 ffff88001cb2fb38 ffffffff8161d881 0000000080000000
[   78.564642]  0000000000000000 ffff88001cb2fb78 ffffffff8103b870 696e55206e6f6f6d
[   78.564645]  ffff88001d965000 0000000000000000 0000000000000000 ffff88001d965000
[   78.564648] Call Trace:
[   78.564655]  [<ffffffff8161d881>] dump_stack+0x4f/0x7b
[   78.564662]  [<ffffffff8103b870>] warn_slowpath_common+0x80/0xc0
[   78.564667]  [<ffffffff81544b00>] ? add_uuid+0x1f0/0x1f0
[   78.564671]  [<ffffffff8103b955>] warn_slowpath_null+0x15/0x20
[   78.564674]  [<ffffffff81562d81>] smp_register+0xf1/0x170
[   78.564680]  [<ffffffff81081236>] ? lock_timer_base.isra.30+0x26/0x50
[   78.564683]  [<ffffffff81544bf0>] powered_complete+0xf0/0x120
[   78.564688]  [<ffffffff8152e622>] hci_req_cmd_complete+0x82/0x260
[   78.564692]  [<ffffffff8153554f>] hci_cmd_complete_evt+0x6cf/0x2e20
[   78.564697]  [<ffffffff81623e43>] ? _raw_spin_unlock_irqrestore+0x13/0x30
[   78.564701]  [<ffffffff8106b0af>] ? __wake_up_sync_key+0x4f/0x60
[   78.564705]  [<ffffffff8153a2ab>] hci_event_packet+0xbcb/0x2e70
[   78.564709]  [<ffffffff814094d3>] ? skb_release_all+0x23/0x30
[   78.564711]  [<ffffffff81409529>] ? kfree_skb+0x29/0x40
[   78.564715]  [<ffffffff815296c8>] hci_rx_work+0x1c8/0x3f0
[   78.564719]  [<ffffffff8105bd91>] ? get_parent_ip+0x11/0x50
[   78.564722]  [<ffffffff8105be25>] ? preempt_count_add+0x55/0xb0
[   78.564727]  [<ffffffff8104f65f>] process_one_work+0x12f/0x360
[   78.564731]  [<ffffffff8104ff9b>] worker_thread+0x6b/0x4b0
[   78.564735]  [<ffffffff8104ff30>] ? cancel_delayed_work_sync+0x10/0x10
[   78.564738]  [<ffffffff810542fa>] kthread+0xea/0x100
[   78.564742]  [<ffffffff81620000>] ? __schedule+0x3e0/0x980
[   78.564745]  [<ffffffff81054210>] ? kthread_create_on_node+0x180/0x180
[   78.564749]  [<ffffffff816246ec>] ret_from_fork+0x7c/0xb0
[   78.564752]  [<ffffffff81054210>] ? kthread_create_on_node+0x180/0x180
[   78.564755] ---[ end trace 8b0d943af76d3736 ]---

This warning is not critical and has only been placed in the code to
actually catch this exact situation. To avoid triggering it move
the smp_unregister() into hci_dev_do_close() which will now also
take care of remove the SMP channel. It is safe to call this function
since it only remove the channel if it has been previously registered.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-29 07:53:42 +02:00
Marcel Holtmann
c7741d16a5 Bluetooth: Perform a power cycle when receiving hardware error event
When receiving a HCI Hardware Error event, the controller should be
assumed to be non-functional until issuing a HCI Reset command.

The Bluetooth hardware errors are vendor specific and so add a
new hdev->hw_error callback that drivers can provide to run extra
code to handle the hardware error.

After completing the vendor specific error handling perform a full
reset of the Bluetooth stack by closing and re-opening the transport.

Based-on-patch-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-28 21:26:24 +01:00
Marcel Holtmann
5c912495b7 Bluetooth: Introduce hci_dev_do_reset helper function
Split the hci_dev_reset ioctl handling into using hci_dev_do_reset
helper function. Similar to what has been done with hci_dev_do_open
and hci_dev_do_close.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-28 21:26:24 +01:00
Johan Hedberg
8f502f847a Bluetooth: Fix notifying discovery state when powering off
The discovery state should be set to stopped when the HCI device is
powered off. This patch adds the appropriate call to the
hci_discovery_set_state() function from hci_dev_do_close() which is
responsible for the power-off procedure.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-28 21:26:23 +01:00
Johan Hedberg
39c5d970d4 Bluetooth: Fix notifying discovery state upon reset
When HCI_Reset is issued the discovery state is assumed to be stopped.
The hci_cc_reset() handler was trying to set the state but it was doing
it without using the hci_discovery_set_state() function. Because of this
e.g. the mgmt Discovering event could go without being sent. This patch
fixes the code to use the hci_discovery_set_state() function instead of
just blindly setting the state value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-28 21:26:23 +01:00
Johan Hedberg
592002863a Bluetooth: Fix check for SSP when enabling SC
There's a check in set_secure_conn() that's supposed to ensure that SSP
is enabled before we try to request the controller to enable SC (since
SSP is a pre-requisite for it). However, this check only makes sense for
controllers actually supporting BR/EDR SC. If we have a 4.0 controller
we're only interested in the LE part of SC and should therefore not be
requiring SSP to be enabled. This patch adds an additional condition to
check for lmp_sc_capable(hdev) before requiring SSP to be enabled.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-28 21:26:22 +01:00
Marcel Holtmann
aa5b034565 Bluetooth: Check for P-256 OOB values in Secure Connections Only mode
If Secure Connections Only mode has been enabled, the it is important
to check that OOB data for P-256 values is provided. In case it is not,
then tell the remote side that no OOB data is present.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-28 21:26:21 +01:00
Marcel Holtmann
a83ed81ef5 Bluetooth: Use helper function to determine BR/EDR OOB data present
When replying to the IO capability request for Secure Simple Pairing and
Secure Connections, the OOB data present fields needs to set. Instead of
making the calculation inline, split this into a separate helper
function.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-28 21:26:20 +01:00
Marcel Holtmann
6665d057fb Bluetooth: Clear P-192 values for OOB when in Secure Connections Only mode
When Secure Connections Only mode has been enabled and remote OOB data
is requested, then only provide P-256 hash and randomizer vaulues. The
fields for P-192 hash and randomizer should be set to zero.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-28 21:26:20 +01:00
Johan Hedberg
d25b78e2ed Bluetooth: Enforce zero-valued hash/rand192 for LE OOB
Until legacy SMP OOB pairing is implemented user space should be given a
clear error when trying to use it. This patch adds a corresponding check
to the Add Remote OOB Data handler function which returns "invalid
parameters" if non-zero Rand192 or Hash192 parameters were given for an
LE address.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-28 21:26:19 +01:00
David S. Miller
95f873f2ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/arm/boot/dts/imx6sx-sdb.dts
	net/sched/cls_bpf.c

Two simple sets of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 16:59:56 -08:00
Christophe Ricard
ec14b6c93c NFC: hci: Remove nfc_hci_pipe2gate function
With the newly introduced pipes table hci_dev fields,
the nfc_hci_pipe2gate routine is no longer needed.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-28 00:03:36 +01:00
Christophe Ricard
8409e4283c NFC: hci: Add cmd_received handler
When a command is received, it is sometime needed to let the CLF driver do
some additional operations. (ex: count remaining pipe notification...)

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-28 00:03:34 +01:00
Christophe Ricard
615b524aca NFC: hci: Reference every pipe information according to notification
We update the tracked pipes status when receiving HCI commands.
Also we forward HCI errors and we reply to any HCI command, even though
we don't support it.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-28 00:03:29 +01:00
Christophe Ricard
af77522320 NFC: hci: Change nfc_hci_send_response gate parameter to pipe
As there can be several pipes connected to the same gate, we need
to know which pipe ID to use when sending an HCI response. A gate
ID is not enough.

Instead of changing the nfc_hci_send_response() API to something
not aligned with the rest of the HCI API, we call nfc_hci_hcp_message_tx
directly.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-27 23:55:20 +01:00
Christophe Ricard
118278f20a NFC: hci: Add pipes table to reference them with a tuple {gate, host}
In order to keep host source information on specific hci event (such as
evt_connectivity or evt_transaction) and because 2 pipes can be connected
to the same gate, it is necessary to add a table referencing every pipe
with a {gate, host} tuple.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-27 23:39:32 +01:00
Christophe Ricard
fda7a49cb9 NFC: hci: Change event_received handler gate parameter to pipe
Several pipes may point to the same CLF gate, so getting the gate ID
as an input is not enough.
For example dual secure element may have 2 pipes (1 for uicc and
1 for eSE) pointing to the connectivity gate.

As resolving gate and host IDs can be done from a pipe, we now pass
the pipe ID to the event received handler.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-27 23:39:23 +01:00
Christoph Hellwig
06539d3071 net: don't OOPS on socket aio
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 12:25:33 -08:00
Jouni Malinen
8ade538bf3 mac80111: Add BIP-GMAC-128 and BIP-GMAC-256 ciphers
This allows mac80211 to configure BIP-GMAC-128 and BIP-GMAC-256 to the
driver and also use software-implementation within mac80211 when the
driver does not support this with hardware accelaration.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-27 11:10:13 +01:00
Jouni Malinen
56c52da2d5 mac80111: Add BIP-CMAC-256 cipher
This allows mac80211 to configure BIP-CMAC-256 to the driver and also
use software-implementation within mac80211 when the driver does not
support this with hardware accelaration.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-27 11:09:13 +01:00
Jouni Malinen
2b2ba0db1c mac80111: Add CCMP-256 cipher
This allows mac80211 to configure CCMP-256 to the driver and also use
software-implementation within mac80211 when the driver does not support
this with hardware accelaration.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[squash ccmp256 -> mic_len argument change]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-27 11:07:35 +01:00
Jouni Malinen
00b9cfa3ff mac80111: Add GCMP and GCMP-256 ciphers
This allows mac80211 to configure GCMP and GCMP-256 to the driver and
also use software-implementation within mac80211 when the driver does
not support this with hardware accelaration.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[remove a spurious newline]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-27 11:06:09 +01:00
Jouni Malinen
cfcf1682c4 cfg80211: Add new GCMP, CCMP-256, BIP-GMAC, BIP-CMAC-256 ciphers
This makes cfg80211 aware of the GCMP, GCMP-256, CCMP-256, BIP-GMAC-128,
BIP-GMAC-256, and BIP-CMAC-256 cipher suites. These new cipher suites
were defined in IEEE Std 802.11ac-2013.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-27 11:04:57 +01:00
Jouni Malinen
37720569cc cfg80211: Fix BIP (AES-CMAC) cipher validation
This cipher can be used only as a group management frame cipher and as
such, there is no point in validating that it is not used with non-zero
key-index. Instead, verify that it is not used as a pairwise cipher
regardless of the key index.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[change code to use switch statement which is easier to extend]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-27 11:03:41 +01:00
Luciano Coelho
9120d94e8f mac80211: handle potential race between suspend and scan completion
If suspend starts while ieee80211_scan_completed() is running, between
the point where SCAN_COMPLETED is set and the work is queued,
ieee80211_scan_cancel() will not catch the work and we may finish
suspending before the work is actually executed, leaving the scan
running while suspended.

To fix this race, queue the scan work during resume if the
SCAN_COMPLETED flag is set and flush it immediately.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-27 09:58:46 +01:00
Herbert Xu
8ea65f4a2d netlink: Kill redundant net argument in netlink_insert
The socket already carries the net namespace with it so there is
no need to be passing another net around.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 00:30:30 -08:00
David S. Miller
bf693f7beb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
ipsec 2015-01-26

Just two small fixes for _decode_session6() where we
might decode to wrong header information in some rare
situations.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 00:28:38 -08:00
Hannes Frederic Sowa
6e9e16e614 ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too
Lubomir Rintel reported that during replacing a route the interface
reference counter isn't correctly decremented.

To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
| [root@rhel7-5 lkundrak]# sh -x lal
| + ip link add dev0 type dummy
| + ip link set dev0 up
| + ip link add dev1 type dummy
| + ip link set dev1 up
| + ip addr add 2001:db8:8086::2/64 dev dev0
| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
| + ip link del dev0 type dummy
| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
|
| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2

During replacement of a rt6_info we must walk all parent nodes and check
if the to be replaced rt6_info got propagated. If so, replace it with
an alive one.

Fixes: 4a287eba2d ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
Reported-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 00:22:14 -08:00
subashab@codeaurora.org
fc752f1f43 ping: Fix race in free in receive path
An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and which leads
to a kernel panic as the socket is freed already in ping_rcv().

-->|exception
-007|sk_mem_uncharge
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish

Fix this incorrect free by cloning this skb and processing this cloned
skb instead.

This patch was suggested by Eric Dumazet

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 00:04:53 -08:00
Herbert Xu
86f3cddbc3 udp_diag: Fix socket skipping within chain
While working on rhashtable walking I noticed that the UDP diag
dumping code is buggy.  In particular, the socket skipping within
a chain never happens, even though we record the number of sockets
that should be skipped.

As this code was supposedly copied from TCP, this patch does what
TCP does and resets num before we walk a chain.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 00:02:41 -08:00
David S. Miller
7d63585bf0 Another set of last-minute fixes:
* fix station double-removal when suspending while associating
  * fix the HT (802.11n) header length calculation
  * fix the CCK radiotap flag used for monitoring, a pretty
    old regression but a simple one-liner
  * fix per-station group-key handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUwmkTAAoJEDBSmw7B7bqr3X0P/3Y/2UnQSBSi2LnqmsuV8igR
 noAwaPuCf2ASs25v9nQv64JM1qC0kiemYesi3fFWHiqPhoxx9WrMSvjXdarynrQN
 LwBG74tcQVEeW70w+36yeKCS+wMU35eTzDDxdnYOfJeNOhW2rQE7WVT2drkc6lVL
 hvGRhqlBx0mVLQ3VCW5c1kmMN/VAttZKx5WWWurmA8OSQzcHJGZW6ZTuqwpDx4RT
 iqwi5uBtMkqTQV+22Zb4xI+YxLGwcGiYjy15KksPsrSZZeS5pJzeyovrDhACBVpE
 aVj3K+OAGVmXCN7HVpQ3tTqCaQea5o+EDqtnT/IQrjnmw6p4mC3co6ShiVn+DkTX
 6nv/ka92k9q1LMGIzl+lje2cNmaerVrDa84gUqnHXbTr80+ugKu4NpZowQgdW4yG
 Qu4kQALkTRhUoNf+RUzXKcFqAW2qLKf83qLcrhbQWkSrY0hzLVWKI/1/FAkCyo6I
 zKhsB44mHMMaKGVq5qsHTD0E89PtiJuizDLiU04uJExYJCi1FCh9NDlc+HaVlTnt
 5LHFbrsPDuAUKoKSiJmJUIyueL+Of5N34+epuLRZb55aE2AQhApg6Nxd+FMuro1m
 0aeHEREEO04QZ7IUrYgBA4G7L1dsZJDD8auU6Y4V3chAg6ArbswbtzAv8ON1tAk+
 pr0kThACKqkfg2tN5fny
 =DhIp
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-01-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Another set of last-minute fixes:
 * fix station double-removal when suspending while associating
 * fix the HT (802.11n) header length calculation
 * fix the CCK radiotap flag used for monitoring, a pretty
   old regression but a simple one-liner
 * fix per-station group-key handling

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 17:32:24 -08:00
Hannes Frederic Sowa
df4d92549f ipv4: try to cache dst_entries which would cause a redirect
Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f886497212 ("ipv4: fix dst race in sk_dst_get()").

Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU.  Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.

Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.

This issue was discovered by Marcelo Leitner.

Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 17:28:27 -08:00
Daniel Borkmann
600ddd6825 net: sctp: fix slab corruption from use after free on INIT collisions
When hitting an INIT collision case during the 4WHS with AUTH enabled, as
already described in detail in commit 1be9a950c6 ("net: sctp: inherit
auth_capable on INIT collisions"), it can happen that we occasionally
still remotely trigger the following panic on server side which seems to
have been uncovered after the fix from commit 1be9a950c6 ...

[  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
[  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
[  533.940559] PGD 5030f2067 PUD 0
[  533.957104] Oops: 0000 [#1] SMP
[  533.974283] Modules linked in: sctp mlx4_en [...]
[  534.939704] Call Trace:
[  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
[  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
[  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
[  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
[  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
[  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
[  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
[  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

... or depending on the the application, for example this one:

[ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
[ 1370.054568] PGD 633c94067 PUD 0
[ 1370.070446] Oops: 0000 [#1] SMP
[ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
[ 1370.963431] Call Trace:
[ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
[ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
[ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
[ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
[ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b

With slab debugging enabled, we can see that the poison has been overwritten:

[  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
[  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
[  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
[  669.826424]  __slab_alloc+0x4bf/0x566
[  669.826433]  __kmalloc+0x280/0x310
[  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
[  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
[  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
[  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
[  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
[  669.826635]  __slab_free+0x39/0x2a8
[  669.826643]  kfree+0x1d6/0x230
[  669.826650]  kzfree+0x31/0x40
[  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
[  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
[  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]

Since this only triggers in some collision-cases with AUTH, the problem at
heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
when having refcnt 1, once directly in sctp_assoc_update() and yet again
from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
the already kzfree'd memory, which is also consistent with the observation
of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
at a later point in time when poison is checked on new allocation).

Reference counting of auth keys revisited:

Shared keys for AUTH chunks are being stored in endpoints and associations
in endpoint_shared_keys list. On endpoint creation, a null key is being
added; on association creation, all endpoint shared keys are being cached
and thus cloned over to the association. struct sctp_shared_key only holds
a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
keeps track of users internally through refcounting. Naturally, on assoc
or enpoint destruction, sctp_shared_key are being destroyed directly and
the reference on sctp_auth_bytes dropped.

User space can add keys to either list via setsockopt(2) through struct
sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
with refcount 1 and in case of replacement drops the reference on the old
sctp_auth_bytes. A key can be set active from user space through setsockopt()
on the id via sctp_auth_set_active_key(), which iterates through either
endpoint_shared_keys and in case of an assoc, invokes (one of various places)
sctp_auth_asoc_init_active_key().

sctp_auth_asoc_init_active_key() computes the actual secret from local's
and peer's random, hmac and shared key parameters and returns a new key
directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
the reference if there was a previous one. The secret, which where we
eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
intitial refcount of 1, which also stays unchanged eventually in
sctp_assoc_update(). This key is later being used for crypto layer to
set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().

To close the loop: asoc->asoc_shared_key is freshly allocated secret
material and independant of the sctp_shared_key management keeping track
of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a
("net: sctp: fix memory leak in auth key management") is independant of
this bug here since it concerns a different layer (though same structures
being used eventually). asoc->asoc_shared_key is reference dropped correctly
on assoc destruction in sctp_association_free() and when active keys are
being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
to remove that sctp_auth_key_put() from there which fixes these panics.

Fixes: 730fc3d05c ("[SCTP]: Implete SCTP-AUTH parameter processing")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 17:02:05 -08:00
Erik Hugne
08bfc9cb76 flow_dissector: add tipc support
The flows are hashed on the sending node address, which allows us
to spread out the TIPC link processing to RPS enabled cores. There
is no point to include the destination address in the hash as that
will always be the same for all inbound links. We have experimented
with a 3-tuple hash over [srcnode, sport, dport], but this showed to
give slightly lower performance because of increased lock contention
when the same link was handled by multiple cores.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 16:58:09 -08:00
Erik Hugne
3fa9cacd69 tipc: fix excessive network event logging
If a large number of namespaces is spawned on a node and TIPC is
enabled in each of these, the excessive printk tracing of network
events will cause the system to grind down to a near halt.
The traces are still of debug value, so instead of removing them
completely we fix it by changing the link state and node availability
logging debug traces.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 16:58:08 -08:00
Daniel Borkmann
fd3e646c87 net: act_bpf: fix size mismatch on filter preparation
Similarly as in cls_bpf, also this code needs to reject mismatches.

Reference: http://article.gmane.org/gmane.linux.network/347406
Fixes: d23b8ad8ab ("tc: add BPF based action")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 16:08:55 -08:00
Daniel Borkmann
1c1bc6bdb7 net: cls_basic: return from walking on match in basic_get
As soon as we've found a matching handle in basic_get(), we can
return it. There's no need to continue walking until the end of
a filter chain, since they are unique anyway.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 16:08:55 -08:00
Daniel Borkmann
3f2ab13594 net: cls_bpf: fix auto generation of per list handles
When creating a bpf classifier in tc with priority collisions and
invoking automatic unique handle assignment, cls_bpf_grab_new_handle()
will return a wrong handle id which in fact is non-unique. Usually
altering of specific filters is being addressed over major id, but
in case of collisions we result in a filter chain, where handle ids
address individual cls_bpf_progs inside the classifier.

Issue is, in cls_bpf_grab_new_handle() we probe for head->hgen handle
in cls_bpf_get() and in case we found a free handle, we're supposed
to use exactly head->hgen. In case of insufficient numbers of handles,
we bail out later as handle id 0 is not allowed.

Fixes: 7d1d65cb84 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 15:50:19 -08:00
Daniel Borkmann
7913ecf69e net: cls_bpf: fix size mismatch on filter preparation
In cls_bpf_modify_existing(), we read out the number of filter blocks,
do some sanity checks, allocate a block on that size, and copy over the
BPF instruction blob from user space, then pass everything through the
classic BPF checker prior to installation of the classifier.

We should reject mismatches here, there are 2 scenarios: the number of
filter blocks could be smaller than the provided instruction blob, so
we do a partial copy of the BPF program, and thus the instructions will
either be rejected from the verifier or a valid BPF program will be run;
in the other case, we'll end up copying more than we're supposed to,
and most likely the trailing garbage will be rejected by the verifier
as well (i.e. we need to fit instruction pattern, ret {A,K} needs to be
last instruction, load/stores must be correct, etc); in case not, we
would leak memory when dumping back instruction patterns. The code should
have only used nla_len() as Dave noted to avoid this from the beginning.
Anyway, lets fix it by rejecting such load attempts.

Fixes: 7d1d65cb84 ("net: sched: cls_bpf: add BPF-based classifier")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 15:50:18 -08:00
Joe Stringer
74ed7ab926 openvswitch: Add support for unique flow IDs.
Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.

This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.

All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 15:45:50 -08:00
Joe Stringer
272c2cf841 openvswitch: Use sw_flow_key_range for key ranges.
These minor tidyups make a future patch a little tidier.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 15:45:50 -08:00
Joe Stringer
d29ab6f8a9 openvswitch: Refactor ovs_flow_tbl_insert().
Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
This tidies up a future patch.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 15:45:49 -08:00
Joe Stringer
5b4237bbc9 openvswitch: Refactor ovs_nla_fill_match().
Refactor the ovs_nla_fill_match() function into separate netlink
serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify
ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask'
parameter - all callers need to nest the flow, and callers have better
knowledge about whether it is serializing a mask or not.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 15:45:49 -08:00
Christophe Ricard
511e78a38a NFC: nfc_disable_se Remove useless blank line at beginning of function
Remove one useless blank line at beginning of nfc_disable_se function.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-26 23:14:33 +01:00
Christophe Ricard
ec0684898f NFC: nfc_enable_se Remove useless blank line at beginning of function
Remove one useless blank line at beginning of nfc_enable_se function.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-01-26 23:14:33 +01:00
Pablo Neira Ayuso
e8781f70a5 netfilter: nf_tables: disable preemption when restoring chain counters
With CONFIG_DEBUG_PREEMPT=y

[22144.496057] BUG: using smp_processor_id() in preemptible [00000000] code: iptables-compat/10406
[22144.496061] caller is debug_smp_processor_id+0x17/0x1b
[22144.496065] CPU: 2 PID: 10406 Comm: iptables-compat Not tainted 3.19.0-rc4+ #
[...]
[22144.496092] Call Trace:
[22144.496098]  [<ffffffff8145b9fa>] dump_stack+0x4f/0x7b
[22144.496104]  [<ffffffff81244f52>] check_preemption_disabled+0xd6/0xe8
[22144.496110]  [<ffffffff81244f90>] debug_smp_processor_id+0x17/0x1b
[22144.496120]  [<ffffffffa07c557e>] nft_stats_alloc+0x94/0xc7 [nf_tables]
[22144.496130]  [<ffffffffa07c73d2>] nf_tables_newchain+0x471/0x6d8 [nf_tables]
[22144.496140]  [<ffffffffa07c5ef6>] ? nft_trans_alloc+0x18/0x34 [nf_tables]
[22144.496154]  [<ffffffffa063c8da>] nfnetlink_rcv_batch+0x2b4/0x457 [nfnetlink]

Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-26 11:50:02 +01:00
Eric Dumazet
1dc7b90f7c ipv6: tcp: fix race in IPV6_2292PKTOPTIONS
IPv6 TCP sockets store in np->pktoptions skbs, and use skb_set_owner_r()
to charge the skb to socket.

It means that destructor must be called while socket is locked.

Therefore, we cannot use skb_get() or atomic_inc(&skb->users)
to protect ourselves : kfree_skb() might race with other users
manipulating sk->sk_forward_alloc

Fix this race by holding socket lock for the duration of
ip6_datagram_recv_ctl()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 00:44:08 -08:00
Dan Carpenter
1b846f9282 bridge: simplify br_getlink() a bit
Static checkers complain that we should maybe set "ret" before we do the
"goto out;".  They interpret the NULL return from br_port_get_rtnl() as
a failure and forgetting to set the error code is a common bug in this
situation.

The code is confusing but it's actually correct.  We are returning zero
deliberately.  Let's re-write it a bit to be more clear.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 23:32:49 -08:00
Martin KaFai Lau
b0a1ba5992 ipv6: Fix __ip6_route_redirect
In my last commit (a3c00e4: ipv6: Remove BACKTRACK macro), the changes in
__ip6_route_redirect is incorrect.  The following case is missed:
1. The for loop tries to find a valid gateway rt. If it fails to find
   one, rt will be NULL.
2. When rt is NULL, it is set to the ip6_null_entry.
3. The newly added 'else if', from a3c00e4, will stop the backtrack from
   happening.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 22:09:51 -08:00
Vivien Didelot
24df8986f3 net: dsa: set slave MII bus PHY mask
When registering a mdio bus, Linux assumes than every port has a PHY and tries
to scan it. If a switch port has no PHY registered, DSA will fail to register
the slave MII bus. To fix this, set the slave MII bus PHY mask to the switch
PHYs mask.

As an example, if we use a Marvell MV88E6352 (which is a 7-port switch with no
registered PHYs for port 5 and port 6), with the following declared names:

	static struct dsa_chip_data switch_cdata = {
		[...]
		.port_names[0] = "sw0",
		.port_names[1] = "sw1",
		.port_names[2] = "sw2",
		.port_names[3] = "sw3",
		.port_names[4] = "sw4",
		.port_names[5] = "cpu",
	};

DSA will fail to create the switch instance. With the PHY mask set for the
slave MII bus, only the PHY for ports 0-4 will be scanned and the instance will
be successfully created.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 16:00:54 -08:00
Harout Hedeshian
c2943f1453 net: ipv6: Add sysctl entry to disable MTU updates from RA
The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.

Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:54:41 -08:00
Alexander Duyck
64c6272351 fib_trie: Various clean-ups for handling slen
While doing further work on the fib_trie I noted a few items.

First I was using calls that were far more complicated than they needed to
be for determining when to push/pull the suffix length.  I have updated the
code to reflect the simplier logic.

The second issue is that I realised we weren't necessarily handling the
case of a leaf_info struct surviving a flush.  I have updated the logic so
that now we will call pull_suffix in the event of having a leaf info value
left in the leaf after flushing it.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck
02525368f4 fib_trie: Move fib_find_alias to file where it is used
The function fib_find_alias is only accessed by functions in fib_trie.c as
such it makes sense to relocate it and cast it as static so that the
compiler can take advantage of optimizations it can do to it as a local
function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck
30cfe7c9c8 fib_trie: Use empty_children instead of counting empty nodes in stats collection
It doesn't make much sense to count the pointers ourselves when
empty_children already has a count for the number of NULL pointers stored
in the tnode.  As such save ourselves the cycles and just use
empty_children.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck
95f60ea3e9 fib_trie: Add collapse() and should_collapse() to resize
This patch really does two things.

First it pulls the logic for determining if we should collapse one node out
of the tree and the actual code doing the collapse into a separate pair of
functions.  This helps to make the changes to these areas more readable.

Second it encodes the upper 32b of the empty_children value onto the
full_children value in the case of bits == KEYLENGTH.  By doing this we are
able to handle the case of a 32b node where empty_children would appear to
be 0 when it was actually 1ul << 32.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck
a80e89d4c6 fib_trie: Fall back to slen update on inflate/halve failure
This change corrects an issue where if inflate or halve fails we were
exiting the resize function without at least updating the slen for the
node.  To correct this I have moved the update of max_size into the while
loop so that it is only decremented on a successful call to either inflate
or halve.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:16 -08:00
Alexander Duyck
69fa57b1e4 fib_trie: Fix RCU bug and merge similar bits of inflate/halve
This patch addresses two issues.

The first issue is the fact that I believe I had the RCU freeing sequence
slightly out of order.  As a result we could get into an issue if a caller
went into a child of a child of the new node, then backtraced into the to be
freed parent, and then attempted to access a child of a child that may have
been consumed in a resize of one of the new nodes children.  To resolve this I
have moved the resize after we have freed the oldtnode.  The only side effect
of this is that we will now be calling resize on more nodes in the case of
inflate due to the fact that we don't have a good way to test to see if a
full_tnode on the new node was there before or after the allocation.  This
should have minimal impact however since the node should already be
correctly size so it is just the cost of calling should_inflate that we
will be taking on the node which is only a couple of cycles.

The second issue is the fact that inflate and halve were essentially doing
the same thing after the new node was added to the trie replacing the old
one.  As such it wasn't really necessary to keep the code in both functions
so I have split it out into two other functions, called replace and
update_children.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:15 -08:00
Alexander Duyck
b3832117b4 fib_trie: Use index & (~0ul << n->bits) instead of index >> n->bits
In doing performance testing and analysis of the changes I recently found
that by shifting the index I had created an unnecessary dependency.

I have updated the code so that we instead shift a mask by bits and then
just test against that as that should save us about 2 CPU cycles since we
can generate the mask while the key and pos are being processed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:47:15 -08:00
Sasha Levin
6b8d9117cc net: llc: use correct size for sysctl timeout entries
The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 00:23:21 -08:00
Tom Herbert
af33c1adae vxlan: Eliminate dependency on UDP socket in transmit path
In the vxlan transmit path there is no need to reference the socket
for a tunnel which is needed for the receive side. We do, however,
need the vxlan_dev flags. This patch eliminate references
to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
vxlan_sock flags.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 23:15:40 -08:00
Tom Herbert
d998f8efa4 udp: Do not require sock in udp_tunnel_xmit_skb
The UDP tunnel transmit functions udp_tunnel_xmit_skb and
udp_tunnel6_xmit_skb include a socket argument. The socket being
passed to the functions (from VXLAN) is a UDP created for receive
side. The only thing that the socket is used for in the transmit
functions is to get the setting for checksum (enabled or zero).
This patch removes the argument and and adds a nocheck argument
for checksum setting. This eliminates the unnecessary dependency
on a UDP socket for UDP tunnel transmit.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-24 23:15:40 -08:00
Trond Myklebust
c4a7ca7749 SUNRPC: Allow waiting on memory allocation
We should be safe now, as long as we don't do GFP_IO or higher allocations

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 18:46:50 -05:00
Trond Myklebust
127b21b89f SUNRPC: Adjust rpciod workqueue parameters
Increase the concurrency level for rpciod threads to allow for allocations
etc that happen in the RPCSEC_GSS layer. Also note that the NFSv4 byte range
locks may now need to allocate memory from inside rpciod.

Add the WQ_HIGHPRI flag to improve latency guarantees while we're at it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-01-24 18:46:49 -05:00
Nicolas Dichtel
193523bf93 vxlan: advertise netns of vxlan dev in fdb msg
Netlink FDB messages are sent in the link netns. The header of these messages
contains the ifindex (ndm_ifindex) of the netdevice, but this ifindex is
unusable in case of x-netns vxlan.
I named the new attribute NDA_NDM_IFINDEX_NETNSID, to avoid confusion with
NDA_IFINDEX.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-23 17:51:15 -08:00
Nicolas Dichtel
1f17257b1f vlan: advertise link netns via netlink
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-23 17:51:15 -08:00
Nicolas Dichtel
3390e39761 ip6gretap: advertise link netns via netlink
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-23 17:51:14 -08:00
Nicolas Dichtel
bdef279b99 rtnl: fix error path when adding an iface with a link net
If an error occurs when the netdevice is moved to the link netns, a full cleanup
must be done.

Fixes: 317f4810e4 ("rtnl: allow to create device with IFLA_LINK_NETNSID set")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-23 17:51:14 -08:00
Thomas Graf
d7924450e1 act_connmark: Add missing dependency on NF_CONNTRACK_MARK
Depending on NETFILTER is not sufficient to ensure the presence of the
'mark' field in nf_conn, also needs to depend on NF_CONNTRACK_MARK.

Fixes: 22a5dc ("net: sched: Introduce connmark action")
Cc: Felix Fietkau <nbd@openwrt.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-23 17:46:09 -08:00
Peter Hurley
dfb2fae7cd Bluetooth: Fix nested sleeps
l2cap/rfcomm/sco_sock_accept() are wait loops which may acquire
sleeping locks. Since both wait loops and sleeping locks use
task_struct.state to sleep and wake, the nested sleeping locks
destroy the wait loop state.

Use the newly-minted wait_woken() and DEFINE_WAIT_FUNC() for the
wait loop. DEFINE_WAIT_FUNC() allows an alternate wake function
to be specified; in this case, the predefined scheduler function,
woken_wake_function(). This wait construct ensures wakeups will
not be missed without requiring the wait loop to set the
task state before condition evaluation. How this works:

 CPU 0                            |  CPU 1
                                  |
                                  | is <condition> set?
                                  | no
set <condition>                   |
                                  |
wake_up_interruptible             |
  woken_wake_function             |
    set WQ_FLAG_WOKEN             |
    try_to_wake_up                |
                                  | wait_woken
                                  |   set TASK_INTERRUPTIBLE
                                  |   WQ_FLAG_WOKEN? yes
                                  |   set TASK_RUNNING
                                  |
                                  | - loop -
				  |
				  | is <condition> set?
                                  | yes - exit wait loop

Fixes "do not call blocking ops when !TASK_RUNNING" warnings
in l2cap_sock_accept(), rfcomm_sock_accept() and sco_sock_accept().

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-23 20:29:42 +02:00
Johan Hedberg
a1443f5a27 Bluetooth: Convert Set SC to use HCI Request
This patch converts the Set Secure Connection HCI handling to use a HCI
request instead of using a hard-coded callback in hci_event.c. This e.g.
ensures that we don't clear the flags incorrectly if something goes
wrong with the power up process (not related to a mgmt Set SC command).

The code can also be simplified a bit since only one pending Set SC
command is allowed, i.e. mgmt_pending_foreach usage is not needed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-23 19:07:03 +01:00
Johan Hedberg
484aabc1c4 Bluetooth: Remove incorrect check for BDADDR_BREDR address type
The Add Remote OOB Data mgmt command should allow data to be passed for
LE as well. This patch removes a left-over check for BDADDR_BREDR that
should not be there anymore.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-23 18:59:31 +01:00
Johan Hedberg
5d57e7964c Bluetooth: Check for valid bdaddr in add_remote_oob_data
Before doing any other verifications, the add_remote_oob_data function
should first check that the given address is valid. This patch adds such
a missing check to the beginning of the function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-23 18:59:30 +01:00
Jeff Layton
3c5199143b sunrpc/lockd: fix references to the BKL
The BKL is completely out of the picture in the lockd and sunrpc code
these days. Update the antiquated comments that refer to it.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-23 10:29:12 -05:00
Johannes Berg
225b818982 mac80211: support beacon statistics
For drivers without beacon filtering, support beacon statistics
entirely, i.e. report the number of beacons and average signal.

For drivers with beacon filtering, give them the number of beacons
received by mac80211 -- in case the device reports only the number
of filtered beacons then driver doesn't have to count all beacons
again as mac80211 already does.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 15:51:38 +01:00
Johannes Berg
3d6dc3431e mac80211: fix per-TID RX-MSDU counter
In the case of non-QoS association, the counter was actually
wrong. The right index isn't security_idx but seqno_idx, as
security_idx will be 0 for data frames, while 16 is needed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 13:26:25 +01:00
Johannes Berg
c5309ba787 mac80211: tdls: disentangle HT supported conditions
These conditions are rather difficult to follow, for example
because "!sta" only exists to not crash in the case that we
don't have a station pointer (WLAN_TDLS_SETUP_REQUEST) in
which the additional condition (peer supports HT) doesn't
actually matter anyway.

Cleaning this up only duplicates two lines of code but makes
the rest far easier to read, so do that.

As a side effect, smatch stops complaining about the lack of
a sta pointer test after the !sta (since the !sta goes away)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 11:42:14 +01:00
Johannes Berg
d6f5cc091b mac80211: tdls: remove shadowing variable
There's no need to use another local 'sta' variable as the
original (outer scope) one isn't needed any more and has
become invalid anyway when exiting the RCU read section.

Remove the inner scope one and along with it the useless NULL
initialization.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 11:37:44 +01:00
Johannes Berg
13874e4b23 nl80211: suppress smatch warnings
smatch warns that we once checked request->ssids in two functions
and then unconditionally used it later again.

This is actually fine, because the code has a relationship between
attrs[NL80211_ATTR_SCAN_SSIDS], n_ssids and request->ssids, but
smatch isn't smart enough to realize that.

Suppress the warnings by always checking just n_ssids - that way
smatch won't know that request->ssids could be NULL, and since it
is only NULL when n_ssids is 0 we still check everything correctly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 11:25:20 +01:00
Johannes Berg
0fa7b39131 nl80211: fix per-station group key get/del and memory leak
In case userspace attempts to obtain key information for or delete a
unicast key, this is currently erroneously rejected unless the driver
sets the WIPHY_FLAG_IBSS_RSN flag. Apparently enough drivers do so it
was never noticed.

Fix that, and while at it fix a potential memory leak: the error path
in the get_key() function was placed after allocating a message but
didn't free it - move it to a better place. Luckily admin permissions
are needed to call this operation.

Cc: stable@vger.kernel.org
Fixes: e31b82136d ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 11:21:02 +01:00
Bob Copeland
985e88b13a Revert "mac80211: keep sending peer candidate events while in listen state"
This reverts commit 2ae70efcea.

The new peer events that are generated by the change are causing problems
with wpa_supplicant in userspace: wpa_s tries to restart SAE authentication
with the peer when receiving the event, even though authentication may be in
progress already, and it gets very confused.

Revert back to the original operating mode, which is to only get events when
there is no corresponding station entry.

Cc: Nishikawa, Kenzoh <Kenzoh.Nishikawa@jp.sony.com>
Cc: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:57:19 +01:00
Luciano Coelho
332ff7fe36 mac80211: complete scan work immediately if quiesced or suspended
It is possible that a deferred scan is queued after the queues are
flushed in __ieee80211_suspend().  The deferred scan work may be
scheduled by ROC or ieee80211_stop_poll().

To make sure don't start a new scan while suspending, check whether
we're quiescing or suspended and complete the scan immediately if
that's the case.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:54:22 +01:00
Emmanuel Grumbach
4afaff176a mac80211: avoid races related to suspend flow
When we go to suspend, there is complex set of states that
avoids races. The quiescing variable is set whlie
__ieee80211_suspend is running. Then suspended is set.
The code makes sure there is no window without any of these
flags.

The problem is that workers can still be enqueued while we
are quiescing. This leads to situations where the driver is
already suspending and other flows like disassociation are
handled by a worker.

To fix this, we need to check quiescing and suspended flags
in the worker itself and not only before enqueueing it.
I also add here extensive documentation to ease the
understanding of these complex issues.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:54:22 +01:00
Emmanuel Grumbach
14f2ae83d0 mac80211: synchronize_net() before flushing the queues
When mac80211 disconnects, it drops all the packets on the
queues. This happens after the net stack has been notified
that we have no link anymore (netif_carrier_off).
netif_carrier_off ensures that no new packets are sent to
xmit() callback, but we might have older packets in the
middle of the Tx path. These packets will land in the
driver's queues after the latter have been flushed.
Synchronize_net() between netif_carrier_off and drv_flush()
will fix this.

Note that we can't call synchronize_net inside
ieee80211_flush_queues since there are flows that call
ieee80211_flush_queues and don't need synchronize_net()
which is an expensive operation.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[reword comment to be more accurate]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:54:21 +01:00
Mathy Vanhoef
3a5c5e81d8 mac80211: properly set CCK flag in radiotap
Fix a regression introduced by commit a5e70697d0 ("mac80211: add radiotap flag
and handling for 5/10 MHz") where the IEEE80211_CHAN_CCK channel type flag was
incorrectly replaced by the IEEE80211_CHAN_OFDM flag. This commit fixes that by
using the CCK flag again.

Cc: stable@vger.kernel.org
Fixes: a5e70697d0 ("mac80211: add radiotap flag and handling for 5/10 MHz")
Signed-off-by: Mathy Vanhoef <vanhoefm@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:53:58 +01:00
Fred Chou
fb142f4bbb mac80211: correct header length calculation
HT Control field may also be present in management frames, as defined
in 8.2.4.1.10 of 802.11-2012. Account for this in calculation of header
length.

Signed-off-by: Fred Chou <fred.chou.nd@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:52:48 +01:00
Luciano Coelho
2af81d6718 mac80211: only roll back station states for WDS when suspending
In normal cases (i.e. when we are fully associated), cfg80211 takes
care of removing all the stations before calling suspend in mac80211.

But in the corner case when we suspend during authentication or
association, mac80211 needs to roll back the station states.  But we
shouldn't roll back the station states in the suspend function,
because this is taken care of in other parts of the code, except for
WDS interfaces.  For AP types of interfaces, cfg80211 takes care of
disconnecting all stations before calling the driver's suspend code.
For station interfaces, this is done in the quiesce code.

For WDS interfaces we still need to do it here, so move the code into
a new switch case for WDS.

Cc: stable@kernel.org [3.15+]
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:47:40 +01:00
Luciano Coelho
9c74893441 nl80211: add an attribute to allow delaying the first scheduled scan cycle
The userspace may want to delay the the first scheduled scan or
net-detect cycle.  Add an optional attribute to the scheduled scan
configuration to pass the delay to be (optionally) used by the driver.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
[add the attribute to the policy to validate it]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:30:47 +01:00
Lorenzo Bianconi
db82d8a966 mac80211: enable TPC through mac80211 stack
Control per packet Transmit Power Control (TPC) in lower drivers
according to TX power settings configured by the user. In particular TPC is
enabled if value passed in enum nl80211_tx_power_setting is
NL80211_TX_POWER_LIMITED (allow using less than specified from userspace),
whereas TPC is disabled if nl80211_tx_power_setting is set to
NL80211_TX_POWER_FIXED (use value configured from userspace)

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:28:51 +01:00
Vadim Kochan
4b681c82d2 nl80211: Allow set network namespace by fd
Added new NL80211_ATTR_NETNS_FD which allows to
set namespace via nl80211 by fd.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-23 10:25:25 +01:00
Johannes Berg
fa7e1fbcb5 mac80211: allow drivers to control software crypto
Some drivers unfortunately cannot support software crypto, but
mac80211 currently assumes that they do.

This has the issue that if the hardware enabling fails for some
reason, the software fallback is used, which won't work. This
clearly isn't desirable, the error should be reported and the
key setting refused.

Support this in mac80211 by allowing drivers to set a new HW
flag IEEE80211_HW_SW_CRYPTO_CONTROL, in which case mac80211 will
only allow software fallback if the set_key() method returns 1.
The driver will also need to advertise supported cipher suites
so that mac80211 doesn't advertise any (future) software ciphers
that the driver can't actually do.

While at it, to make it easier to support this, refactor the
ieee80211_init_cipher_suites() code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-22 22:01:01 +01:00
Marcel Holtmann
ed93ec69c7 Bluetooth: Require SSP enabling before BR/EDR Secure Connections
When BR/EDR is supported by a controller, then it is required to enable
Secure Simple Pairing first before enabling the Secure Connections
feature.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-22 21:44:20 +02:00
Marcel Holtmann
3a5486e1fd Bluetooth: Limit BR/EDR switching for LE only with secure connections
When a powered on dual-mode controller has been configured to operate
as LE only with secure connections, then the BR/EDR side of things can
not be switched back on. Do reconfigure the controller it first needs
to be powered down.

The secure connections feature is implemented in the BR/EDR controller
while for LE it is implemented in the host. So explicitly forbid such
a transaction to avoid inconsistent states.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-22 21:42:45 +02:00
Marcel Holtmann
574ea3c713 Bluetooth: Fix dependency for BR/EDR Secure Connections mode on SSP
The BR/EDR Secure Connections feature should only be enabled when the
Secure Simple Pairing mode has been enabled first. However since secure
connections is feature that is valid for BR/EDR and LE, this needs
special handling.

When enabling secure connections on a LE only configured controller,
thent the BR/EDR side should not be enabled in the controller. This
patches makes the BR/EDR Secure Connections feature depending on
enabling Secure Simple Pairing mode first.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-22 21:42:18 +02:00
Szymon Janc
91200e9f3e Bluetooth: Fix reporting invalid RSSI for LE devices
Start Discovery was reporting 0 RSSI for invalid RSSI only for
BR/EDR devices. LE devices were reported with RSSI 127.

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.19+
2015-01-22 18:06:43 +01:00
Johannes Berg
54330bf63b mac80211: fix HW registration error paths
Station info state is started in allocation, so should be
destroyed on free (it's just a timer); rate control must
be freed if anything afterwards fails to initialize.

LED exit should be later, no need for locking there, but
it needs to be done also when rate init failed.

Also clean up the code by moving a label so the locking
doesn't have to be done separately.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-21 17:34:37 +01:00
Michael S. Tsirkin
7754f53e94 virtio/9p: verify device has config space
Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/9p needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:45 +10:30
David S. Miller
0c49087462 Some further updates for net-next:
* fix network-manager which was broken by the previous changes
  * fix delete-station events, which were broken by me making the
    genlmsg_end() mistake
  * fix a timer left running during suspend in some race conditions
    that would cause an annoying (but harmless) warning
  * (less important, but in the tree already) remove 80+80 MHz rate
    reporting since the spec doesn't distinguish it from 160 MHz;
    as the bitrate they're both 160 MHz bandwidth
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUvUZlAAoJEDBSmw7B7bqrfNMQAJT5jjOSjmwW8Zdvujvda/qt
 bFpYa9t0NsN3izzMpjPSrCwPrHN5qE86ZA8TcZrIzejPH4rpltjaXB6JNHZardVo
 deCUWU9xotoPELoE0Xex9mHPEkYYvOaht/P8A/88qP1S2PykMmj9fqNeijyUvwuo
 Jlsh0wKe4Jq6bCmdxvy/bde84ceAQcuh2TKNov1S0tB38tRY9qSu2n6ZGpoMNcEe
 CWuW+23jL1uAvt6Ljk2fTKdLR8iyXykfM0UMX2/4R2PMnJXK/dQqV/eeXTjpxtMk
 UV4aIMcSS19D7HxICKOXOdZLdMMuXaFUnUMlGLBtXZz3n9lZoP7RtVIHoib8VBXZ
 tY7xQrK6YNvwZ4SZZPuv/yr6YWP2+vBM2FUfXjzD+or6uYsej203a5q0RsOY+3Tp
 6Yklr+mQNlrOtpMsHMSgrBUUZsAK1I95ALMVVqaq1hgf1cDvRIDHlOo4A7bjwNFw
 eA3L1o4O1/E/IGp4v6+2Iukc9rIwm11sNr/wuD8dDkZTradUPH1iY6J5sxJNb2Nq
 YdpCnQ/lNXj650+z9/G2omSA6DTTTOtXJPxKR+FOHZVKDpZYtF6TxKb0S79fINps
 6ZlWIna5bUiXF1b6xad1x+vtyjNMgTvkg6mR+WQnvF57Ri8hucbtpv5wpA5bhYUQ
 Fbz9VZF2nfMeIbXfTaWi
 =Bvmr
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Some further updates for net-next:
 * fix network-manager which was broken by the previous changes
 * fix delete-station events, which were broken by me making the
   genlmsg_end() mistake
 * fix a timer left running during suspend in some race conditions
   that would cause an annoying (but harmless) warning
 * (less important, but in the tree already) remove 80+80 MHz rate
   reporting since the spec doesn't distinguish it from 160 MHz;
   as the bitrate they're both 160 MHz bandwidth

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 16:22:19 -05:00
Johannes Berg
926e9878a3 phonet netlink: allow multiple messages per skb in route dump
My previous patch to this file changed the code to be bug-compatible
towards userspace. Unless userspace (which I wasn't able to find)
implements the dump reader by hand in a wrong way, this isn't needed.
If it uses libnl or similar code putting multiple messages into a
single SKB is far more efficient.

Change the code to do this. While at it, also clean it up and don't
use so many variables - just store the address in the callback args
directly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 16:20:17 -05:00
Felix Fietkau
22a5dc0e5e net: sched: Introduce connmark action
This tc action allows you to retrieve the connection tracking mark
This action has been used heavily by openwrt for a few years now.

There are known limitations currently:

doesn't work for initial packets, since we only query the ct table.
  Fine given use case is for returning packets

no implicit defrag.
  frags should be rare so fix later..

won't work for more complex tasks, e.g. lookup of other extensions
  since we have no means to store results

we still have a 2nd lookup later on via normal conntrack path.
This shouldn't break anything though since skb->nfct isn't altered.

V2:
remove unnecessary braces (Jiri)
change the action identifier to 14 (Jiri)
Fix some stylistic issues caught by checkpatch
V3:
Move module params to bottom (Cong)
Get rid of tcf_hashinfo_init and friends and conform to newer API (Cong)

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 16:02:06 -05:00
Florian Fainelli
8db0a2ee2c net: bridge: reject DSA-enabled master netdevices as bridge members
DSA-enabled master network devices with a switch tagging protocol should
strip the protocol specific format before handing the frame over to
higher layer.

When adding such a DSA master network device as a bridge member, we go
through the following code path when receiving a frame:

__netif_receive_skb_core
	-> first ptype check against ptype_all is not returning any
	   handler for this skb

	-> check and invoke rx_handler:
		-> deliver frame to the bridge layer: br_handle_frame

DSA registers a ptype handler with the fake ETH_XDSA ethertype, which is
called *after* the bridge-layer rx_handler has run. br_handle_frame()
tries to parse the frame it received from the DSA master network device,
and will not be able to match any of its conditions and jumps straight
at the end of the end of br_handle_frame() and returns
RX_HANDLER_CONSUMED there.

Since we returned RX_HANDLER_CONSUMED, __netif_receive_skb_core() stops
RX processing for this frame and returns NET_RX_SUCCESS, so we never get
a chance to call our switch tag packet processing logic and deliver
frames to the DSA slave network devices, and so we do not get any
functional bridge members at all.

Instead of cluttering the bridge receive path with DSA-specific checks,
and rely on assumptions about how __netif_receive_skb_core() is
processing frames, we simply deny adding the DSA master network device
(conduit interface) as a bridge member, leaving only the slave DSA
network devices to be bridge members, since those will work correctly in
all circumstances.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 15:45:10 -05:00
Florian Fainelli
728c02089a net: ipv4: handle DSA enabled master network devices
The logic to configure a network interface for kernel IP
auto-configuration is very simplistic, and does not handle the case
where a device is stacked onto another such as with DSA. This causes the
kernel not to open and configure the master network device in a DSA
switch tree, and therefore slave network devices using this master
network devices as conduit device cannot be open.

This restriction comes from a check in net/dsa/slave.c, which is
basically checking the master netdev flags for IFF_UP and returns
-ENETDOWN if it is not the case.

Automatically bringing-up DSA master network devices allows DSA slave
network devices to be used as valid interfaces for e.g: NFS root booting
by allowing kernel IP autoconfiguration to succeed on these interfaces.

On the reverse path, make sure we do not attempt to close a DSA-enabled
device as this would implicitely prevent the slave DSA network device
from operating.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 15:45:10 -05:00
Hagen Paul Pfeifer
9d289715eb ipv6: stop sending PTB packets for MTU < 1280
Reduce the attack vector and stop generating IPv6 Fragment Header for
paths with an MTU smaller than the minimum required IPv6 MTU
size (1280 byte) - called atomic fragments.

See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
for more information and how this "feature" can be misused.

[1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00

Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:52:07 -05:00
Nicolas Dichtel
317f4810e4 rtnl: allow to create device with IFLA_LINK_NETNSID set
This patch adds the ability to create a netdevice in a specified netns and
then move it into the final netns. In fact, it allows to have a symetry between
get and set rtnl messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:32:03 -05:00
Nicolas Dichtel
1728d4fabd tunnels: advertise link netns via netlink
Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:32:03 -05:00
Nicolas Dichtel
d37512a277 rtnl: add link netns id to interface messages
This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels).
With this attribute, it's possible to interpret correctly all advertised
information (like IFLA_LINK, etc.).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:21:26 -05:00
Nicolas Dichtel
0c7aecd4bd netns: add rtnl cmd to add and get peer netns ids
With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to the netns where it is added (ie valid only into this
netns).

The main function (ie the one exported to other module), peernet2id(), allows to
get the id of a peer netns. If no id has been assigned by the user, this
function allocates one.

These ids will be used in netlink messages to point to a peer netns, for example
in case of a x-netns interface.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:21:18 -05:00
Emmanuel Grumbach
c1e140bf79 mac80211: delete the assoc/auth timer upon suspend
While suspending, we destroy the authentication /
association that might be taking place. While doing so, we
forgot to delete the timer which can be firing after
local->suspended is already set, producing the warning below.

Fix that by deleting the timer.

[66722.825487] WARNING: CPU: 2 PID: 5612 at net/mac80211/util.c:755 ieee80211_can_queue_work.isra.18+0x32/0x40 [mac80211]()
[66722.825487] queueing ieee80211 work while going to suspend
[66722.825529] CPU: 2 PID: 5612 Comm: kworker/u16:69 Tainted: G        W  O  3.16.1+ #24
[66722.825537] Workqueue: events_unbound async_run_entry_fn
[66722.825545] Call Trace:
[66722.825552]  <IRQ>  [<ffffffff817edbb2>] dump_stack+0x4d/0x66
[66722.825556]  [<ffffffff81075cad>] warn_slowpath_common+0x7d/0xa0
[66722.825572]  [<ffffffffa06b5b90>] ? ieee80211_sta_bcn_mon_timer+0x50/0x50 [mac80211]
[66722.825573]  [<ffffffff81075d1c>] warn_slowpath_fmt+0x4c/0x50
[66722.825586]  [<ffffffffa06977a2>] ieee80211_can_queue_work.isra.18+0x32/0x40 [mac80211]
[66722.825598]  [<ffffffffa06977d5>] ieee80211_queue_work+0x25/0x50 [mac80211]
[66722.825611]  [<ffffffffa06b5bac>] ieee80211_sta_timer+0x1c/0x20 [mac80211]
[66722.825614]  [<ffffffff8108655a>] call_timer_fn+0x8a/0x300

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-19 18:59:20 +01:00
Johannes Berg
6e9f3fa4f0 Revert "wireless: Support of IFLA_INFO_KIND rtnl attribute"
This reverts commit ba1debdfed.

Oliver reported that it breaks network-manager, for some reason with
this patch NM decides that the device isn't wireless but "generic"
(ethernet), sees no carrier (as expected with wifi) and fails to do
anything else with it.

Revert this to unbreak userspace.

Reported-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-19 18:49:56 +01:00
Pablo Neira Ayuso
75e8d06d43 netfilter: nf_tables: validate hooks in NAT expressions
The user can crash the kernel if it uses any of the existing NAT
expressions from the wrong hook, so add some code to validate this
when loading the rule.

This patch introduces nft_chain_validate_hooks() which is based on
an existing function in the bridge version of the reject expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-01-19 14:52:39 +01:00
Rosen, Rami
4de8b41370 bridge: remove oflags from setlink/dellink.
Commit 02dba4388d ("bridge: fix setlink/dellink notifications") removed usage of oflags in
both rtnl_bridge_setlink() and rtnl_bridge_dellink() methods. This patch removes this variable as it is no
longer needed.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 01:22:48 -05:00
David S. Miller
7b46a644a4 netlink: Fix bugs in nlmsg_end() conversions.
Commit 053c095a82 ("netlink: make nlmsg_end() and genlmsg_end()
void") didn't catch all of the cases where callers were breaking out
on the return value being equal to zero, which they no longer should
when zero means success.

Fix all such cases.

Reported-by: Marcel Holtmann <marcel@holtmann.org>
Reported-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 23:36:08 -05:00
Johannes Berg
053c095a82 netlink: make nlmsg_end() and genlmsg_end() void
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 01:03:45 -05:00
Richard Alpe
d6e164e321 tipc: fix socket list regression in new nl api
Commit 07f6c4bc (tipc: convert tipc reference table to use generic
rhashtable) introduced a problem with port listing in the new netlink
API. It broke the resume functionality resulting in a never ending
loop. This was caused by starting with the first hash table every time
subsequently never returning an empty skb (terminating).

This patch fixes the resume mechanism by keeping a logical reference
to the last hash table along with a logical reference to the socket
(port) that didn't fit in the previous message.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 00:27:05 -05:00
David S. Miller
e445dd5f67 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-01-16

Here are some more bluetooth & ieee802154 patches intended for 3.20:

 - Refactoring & cleanups of ieee802154 & 6lowpan code
 - Various fixes to the btmrvl driver
 - Fixes for Bluetooth Low Energy Privacy feature handling
 - Added build-time sanity checks for sockaddr sizes
 - Fixes for Security Manager registration on LE-only controllers
 - Refactoring of broken inquiry mode handling to a generic quirk

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 00:25:30 -05:00
Jiri Pirko
3aeb66176f net: replace br_fdb_external_learn_* calls with switchdev notifier events
This patch benefits from newly introduced switchdev notifier and uses it
to propagate fdb learn events from rocker driver to bridge. That avoids
direct function calls and possible use by other listeners (ovs).

Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 00:23:57 -05:00
Jiri Pirko
03bf0c2812 switchdev: introduce switchdev notifier
This patch introduces new notifier for purposes of exposing events which happen
on switch driver side. The consumers of the event messages are mainly involved
masters, namely bridge and ovs.

Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18 00:23:57 -05:00
Nicolas Dichtel
66c1a12c65 socket: use ki_nbytes instead of iov_length()
This field already contains the length of the iovec, no need to calculate it
again.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-17 23:58:37 -05:00
Daniel Borkmann
2061dcd6bf net: sctp: fix race for one-to-many sockets in sendmsg's auto associate
I.e. one-to-many sockets in SCTP are not required to explicitly
call into connect(2) or sctp_connectx(2) prior to data exchange.
Instead, they can directly invoke sendmsg(2) and the SCTP stack
will automatically trigger connection establishment through 4WHS
via sctp_primitive_ASSOCIATE(). However, this in its current
implementation is racy: INIT is being sent out immediately (as
it cannot be bundled anyway) and the rest of the DATA chunks are
queued up for later xmit when connection is established, meaning
sendmsg(2) will return successfully. This behaviour can result
in an undesired side-effect that the kernel made the application
think the data has already been transmitted, although none of it
has actually left the machine, worst case even after close(2)'ing
the socket.

Instead, when the association from client side has been shut down
e.g. first gracefully through SCTP_EOF and then close(2), the
client could afterwards still receive the server's INIT_ACK due
to a connection with higher latency. This INIT_ACK is then considered
out of the blue and hence responded with ABORT as there was no
alive assoc found anymore. This can be easily reproduced f.e.
with sctp_test application from lksctp. One way to fix this race
is to wait for the handshake to actually complete.

The fix defers waiting after sctp_primitive_ASSOCIATE() and
sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
from sctp_sendmsg() have already been placed into the output
queue through the side-effect interpreter, and therefore can then
be bundeled together with COOKIE_ECHO control chunks.

strace from example application (shortened):

socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
           msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
close(3) = 0

tcpdump before patch (fooling the application):

22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]

tcpdump after patch:

14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]

Looks like this bug is from the pre-git history museum. ;)

Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-17 23:52:20 -05:00
Jiri Pirko
33e9fcc666 tc: cls_bpf: rename bpf_len to bpf_num_ops
It was suggested by DaveM to change the name as "len" might indicate
unit bytes.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-17 23:51:10 -05:00
Jiri Pirko
d23b8ad8ab tc: add BPF based action
This action provides a possibility to exec custom BPF code.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-17 23:51:10 -05:00
Roopa Prabhu
02dba4388d bridge: fix setlink/dellink notifications
problems with bridge getlink/setlink notifications today:
        - bridge setlink generates two notifications to userspace
                - one from the bridge driver
                - one from rtnetlink.c (rtnl_bridge_notify)
        - dellink generates one notification from rtnetlink.c. Which
	means bridge setlink and dellink notifications are not
	consistent

        - Looking at the code it appears,
	If both BRIDGE_FLAGS_MASTER and BRIDGE_FLAGS_SELF were set,
        the size calculation in rtnl_bridge_notify can be wrong.
        Example: if you set both BRIDGE_FLAGS_MASTER and BRIDGE_FLAGS_SELF
        in a setlink request to rocker dev, rtnl_bridge_notify will
	allocate skb for one set of bridge attributes, but,
	both the bridge driver and rocker dev will try to add
	attributes resulting in twice the number of attributes
	being added to the skb.  (rocker dev calls ndo_dflt_bridge_getlink)

There are multiple options:
1) Generate one notification including all attributes from master and self:
   But, I don't think it will work, because both master and self may use
   the same attributes/policy. Cannot pack the same set of attributes in a
   single notification from both master and slave (duplicate attributes).

2) Generate one notification from master and the other notification from
   self (This seems to be ideal):
     For master: the master driver will send notification (bridge in this
	example)
     For self: the self driver will send notification (rocker in the above
	example. It can use helpers from rtnetlink.c to do so. Like the
	ndo_dflt_bridge_getlink api).

This patch implements 2) (leaving the 'rtnl_bridge_notify' around to be used
with 'self').

v1->v2 :
	- rtnl_bridge_notify is now called only for self,
	so, remove 'BRIDGE_FLAGS_SELF' check and cleanup a few things
	- rtnl_bridge_dellink used to always send a RTM_NEWLINK msg
	earlier. So, I have changed the notification from br_dellink to
	go as RTM_NEWLINK

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-17 23:49:51 -05:00
Johannes Berg
ee1c244219 genetlink: synchronize socket closing and family removal
In addition to the problem Jeff Layton reported, I looked at the code
and reproduced the same warning by subscribing and removing the genl
family with a socket still open. This is a fairly tricky race which
originates in the fact that generic netlink allows the family to go
away while sockets are still open - unlike regular netlink which has
a module refcount for every open socket so in general this cannot be
triggered.

Trying to resolve this issue by the obvious locking isn't possible as
it will result in deadlocks between unregistration and group unbind
notification (which incidentally lockdep doesn't find due to the home
grown locking in the netlink table.)

To really resolve this, introduce a "closing socket" reference counter
(for generic netlink only, as it's the only affected family) in the
core netlink code and use that in generic netlink to wait for all the
sockets that are being closed at the same time as a generic netlink
family is removed.

This fixes the race that when a socket is closed, it will should call
the unbind, but if the family is removed at the same time the unbind
will not find it, leading to the warning. The real problem though is
that in this case the unbind could actually find a new family that is
registered to have a multicast group with the same ID, and call its
mcast_unbind() leading to confusing.

Also remove the warning since it would still trigger, but is now no
longer a problem.

This also moves the code in af_netlink.c to before unreferencing the
module to avoid having the same problem in the normal non-genl case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 17:04:25 -05:00
Johannes Berg
5ad6300524 genetlink: disallow subscribing to unknown mcast groups
Jeff Layton reported that he could trigger the multicast unbind warning
in generic netlink using trinity. I originally thought it was a race
condition between unregistering the generic netlink family and closing
the socket, but there's a far simpler explanation: genetlink currently
allows subscribing to groups that don't (yet) exist, and the warning is
triggered when unsubscribing again while the group still doesn't exist.

Originally, I had a warning in the subscribe case and accepted it out of
userspace API concerns, but the warning was of course wrong and removed
later.

However, I now think that allowing userspace to subscribe to groups that
don't exist is wrong and could possibly become a security problem:
Consider a (new) genetlink family implementing a permission check in
the mcast_bind() function similar to the like the audit code does today;
it would be possible to bypass the permission check by guessing the ID
and subscribing to the group it exists. This is only possible in case a
family like that would be dynamically loaded, but it doesn't seem like a
huge stretch, for example wireless may be loaded when you plug in a USB
device.

To avoid this reject such subscription attempts.

If this ends up causing userspace issues we may need to add a workaround
in af_netlink to deny such requests but not return an error.

Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 17:04:24 -05:00
Johannes Berg
5700712122 cfg80211: fix checking nl80211_send_station() return value
The return value from nl80211_send_station() is the length of the
skb, or a negative error, so abort sending the message only when
the return value was negative.

This fixes the ibss_rsn wpa_supplicant test case.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-16 21:05:52 +01:00
Johannes Berg
5e06a9e8b6 mac80211: remove doubled semicolon
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-16 13:27:56 +01:00
Rickard Strandqvist
0026b6551b Bluetooth: Remove unused function
Remove the function hci_conn_change_link_key() that is not used anywhere.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-16 13:06:38 +02:00
Herbert Xu
919d9db952 netlink: Fix netlink_insert EADDRINUSE error
The patch c5adde9468 ("netlink:
eliminate nl_sk_hash_lock") introduced a bug where the EADDRINUSE
error has been replaced by ENOMEM.  This patch rectifies that
problem.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 02:38:07 -05:00
Eric Dumazet
ac64da0b83 net: rps: fix cpu unplug
softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.

A second problem is that when we transfert the NAPI poll_list from
victim to current cpu, we absolutely need to special case the percpu
backlog, because we do not want to add complex locking to protect
process_queue : Only owner cpu is allowed to manipulate it, unless cpu
is offline.

Based on initial patch from Prasad Sodagudi & Subash Abhinov
Kasiviswanathan.

This version is better because we do not slow down packet processing,
only make migration safer.

Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 01:02:42 -05:00
Willem de Bruijn
f812116b17 ip: zero sockaddr returned on error queue
The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as

    struct {
            struct sock_extended_err ee;
            struct sockaddr_in(6)    offender;
    } errhdr;

The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.

An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 19:41:16 -05:00
Nicolas Dichtel
12d872511c bridge: use MDBA_SET_ENTRY_MAX for maxtype in nlmsg_parse()
This is just a cleanup, because in the current code MDBA_SET_ENTRY_MAX ==
MDBA_SET_ENTRY.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 19:37:20 -05:00
David S. Miller
aaef66b837 Just two fixes - one for an uninialized variable and
one for a deadlock in regulatory processing.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUt7Z6AAoJEDBSmw7B7bqrgxoQALEEWcIJ1wmu+M7ijdiXLiUM
 vRNuxGENwIgfdmoTs6R7pgKhEYFzePWccjHOzt9cQB5efdRzDjrxj2fDrPf4o5JB
 7of2uHGoaD2RI2H+pJS1URT8igmxDJii+bOEzHn/WL730Hgr2J2iuJizxZ2lzsVM
 VKkiwOykV3kfN5MGsj7yvJQXR32DlGfmiT86+3bjNhE8hgU38NgE0TeUUnyF0AS9
 jLV5mpJfkLmZyZmnvszV5tiqQQmQAdHImI+vbHuhzNUUAn6RLswxbWBzUrLXpXqu
 5KBR2P/6TU4X89NcYGm+JhTI9PghsMbh1zDuqDQ9gq8j0mrV7Kzh1K6LdYoVpfXf
 s42gHe32+Mh0l6LRTlsjftMxJbFla7I6madPcVTqJCV2y1LocD1BseJ+qX5bngU1
 lBSSbzE9MlAl5gyHVDh1CAV+8FM0CP8Ff3WtAyr8XtDxfAUwmo3xBqmL8pvLq6nh
 49kDqDVOzC5KzASYIjqBwmRMcqW2AnaNQG64iIOzM3ure/l5trncPHHPsMkxgwu+
 dDgEXwjWhJNaxWt7fcTSZndATLCRvkeb6ZeRoqmY6A2GJgzpUIhm6HETXc9BNGbg
 3J56176xx04LYg6U5+vMiU5A+gFjlrUknQ3MGXF0KPgw0MvtSyempobV68Lpul4r
 6DviuT9NiRqxloaBimyx
 =bMKg
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-01-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Just two fixes - one for an uninialized variable and
one for a deadlock in regulatory processing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 19:28:36 -05:00
David S. Miller
27f097177d Here's a big pile of changes for this round.
We have
  * a lot of regulatory code changes to deal with the
    way newer Intel devices handle this
  * a change to drop packets while disconnecting from
    an AP instead of trying to wait for them
  * a new attempt at improving the tailroom accounting
    to not kick in too much for performance reasons
  * improvements in wireless link statistics
  * many other small improvements and small fixes that
    didn't seem necessary for 3.19 (e.g. in hwsim which
    is testing only code)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUt7WEAAoJEDBSmw7B7bqrVBoP/2EViE62HMmXdqG1SZWz8q9o
 Iigq8STC/sT2WCx1pYm+tKuVW4LD2O3mCriGNP8A3RwzDZ6H7sKJYb1gV6QCPV6f
 4+yT5VSAB3D3lHmp/bbyNsmKCBQ5uS4LVgDrokrkbGpacDu94PYS5Wv9t3x6PBVB
 5Xjky6g6A/pSuxTIstSO9k5xkzNjaB1TxvVRz/gJrGcFQVkDFSlVbuTHUVxs8p+p
 k6mwY/2WYijZkswWZVQTJLQlF9vRI7PYkKs5m8gz4pjNU48oFJoyu4IP3Z1Xj/Sm
 zgT1C9rgp0Du74HYO2niGAvLWgKajAZuW5hIacDndUPjYQQBLgGs/bCJGSntM+x9
 XoOdPixdFPT/58ijyYZlmHc8rxPOd2kHsVbwGplp8f195S4VO04D+ejfOaoAUFwX
 v/kMvO3XIFmEH1jjkDAC3OTcRMYVMuENyWl7pFzxHIzPeRiEpQUd9iSdM4yol0F2
 ZyWvKud4U75Sh+aCiDIIBETtdfCRFe12hgKs4COYbI/UYkGPTPrNei/uisopdubT
 JC+7pZOYdSgoX12yVi6ds6DmKE/ZpIQyhIK4wTWgVoszbnfdb9Mw7mJEThwNRjeK
 JJPsbuty7u8HWjXzEqHLoTV3BFv1cgRSJc5Wt0zfME+LzD7iQpEpv+QBAguwwChD
 Osn55Z3FnKEmBdGcOIje
 =vaEW
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2015-01-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Here's a big pile of changes for this round.

We have
 * a lot of regulatory code changes to deal with the
   way newer Intel devices handle this
 * a change to drop packets while disconnecting from
   an AP instead of trying to wait for them
 * a new attempt at improving the tailroom accounting
   to not kick in too much for performance reasons
 * improvements in wireless link statistics
 * many other small improvements and small fixes that
   didn't seem necessary for 3.19 (e.g. in hwsim which
   is testing only code)

Conflicts:
	drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c

Minor overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 19:16:56 -05:00
Eric Dumazet
5055c371bf ipv4: per cpu uncached list
RAW sockets with hdrinc suffer from contention on rt_uncached_lock
spinlock.

One solution is to use percpu lists, since most routes are destroyed
by the cpu that created them.

It is unclear why we even have to put these routes in uncached_list,
as all outgoing packets should be freed when a device is dismantled.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: caacf05e5a ("ipv4: Properly purge netdev references on uncached routes.")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 18:26:16 -05:00
Johannes Berg
b51f3beecf cfg80211: change bandwidth reporting to explicit field
For some reason, we made the bandwidth separate flags, which
is rather confusing - a single rate cannot have different
bandwidths at the same time.

Change this to no longer be flags but use a separate field
for the bandwidth ('bw') instead.

While at it, add support for 5 and 10 MHz rates - these are
reported as regular legacy rates with their real bitrate,
but tagged as 5/10 now to make it easier to distinguish them.

In the nl80211 API, the flags are preserved, but the code
now can also clearly only set a single one of the flags.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-15 22:41:32 +01:00
Chuck Lever
a97c331f9a svcrdma: Handle additional inline content
Most NFS RPCs place their large payload argument at the end of the
RPC header (eg, NFSv3 WRITE). For NFSv3 WRITE and SYMLINK, RPC/RDMA
sends the complete RPC header inline, and the payload argument in
the read list. Data in the read list is the last part of the XDR
stream.

One important case is not like this, however. NFSv4 COMPOUND is a
counted array of operations. A WRITE operation, with its large data
payload, can appear in the middle of the compound's operations
array. Thus NFSv4 WRITE compounds can have header content after the
WRITE payload.

The Linux client, for example, performs an NFSv4 WRITE like this:

  { PUTFH, WRITE, GETATTR }

Though RFC 5667 is not precise about this, the proper way to convey
this compound is to place the GETATTR inline, _after_ the front of
the RPC header. The receiver inserts the read list payload into the
XDR stream after the initial WRITE arguments, and before the GETATTR
operation, thanks to the value of the read list "position" field.

The Linux client currently sends the GETATTR at the end of the
RPC/RDMA read list, which is incorrect. It will be corrected in the
future.

The Linux server currently rejects NFSv4 compounds with inline
content after the read list. For the above NFSv4 WRITE compound, the
NFS compound header indicates there are three operations, but the
server finds nonsense when it looks in the XDR stream for the third
operation, and the compound fails with OP_ILLEGAL.

Move trailing inline content to the end of the XDR buffer's page
list. This presents incoming NFSv4 WRITE compounds to NFSD in the
same way the socket transport does.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:49 -05:00
Chuck Lever
fcbeced5b4 svcrdma: Move read list XDR round-up logic
This is a pre-requisite for a subsequent patch.

Read list XDR round-up needs to be done _before_ additional inline
content is copied to the end of the XDR buffer's page list. Move
the logic added by commit e560e3b510 ("svcrdma: Add zero padding
if the client doesn't send it").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:48 -05:00
Chuck Lever
0b056c224b svcrdma: Support RDMA_NOMSG requests
Currently the Linux server can not decode RDMA_NOMSG type requests.
Operations whose length exceeds the fixed size of RDMA SEND buffers,
like large NFSv4 CREATE(NF4LNK) operations, must be conveyed via
RDMA_NOMSG.

For an RDMA_MSG type request, the client sends the RPC/RDMA, RPC
headers, and some or all of the NFS arguments via RDMA SEND.

For an RDMA_NOMSG type request, the client sends just the RPC/RDMA
header via RDMA SEND. The request's read list contains elements for
the entire RPC message, including the RPC header.

NFSD expects the RPC/RMDA header and RPC header to be contiguous in
page zero of the XDR buffer. Add logic in the RDMA READ path to make
the read list contents land where the server prefers, when the
incoming message is a type RDMA_NOMSG message.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:47 -05:00
Chuck Lever
61edbcb7c7 svcrdma: rc_position sanity checking
An RPC/RDMA client may send large RPC arguments via a read
list. This is a list of scatter/gather elements which convey
RPC call arguments too large to fit in a small RDMA SEND.

Each entry in the read list has a "position" field, whose value is
the byte offset in the XDR stream where the data in that entry is to
be inserted. Entries which share the same "position" value make up
the same RPC argument. The receiver inserts entries with the same
position field value in list order into the XDR stream.

Currently the Linux NFS/RDMA server cannot handle receiving read
chunks in more than one position, mostly because no current client
sends read lists with elements in more than one position. As a
sanity check, ensure that all received chunks have the same
"rc_position."

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:47 -05:00
Chuck Lever
e54524111f svcrdma: Plant reader function in struct svcxprt_rdma
The RDMA reader function doesn't change once an svcxprt_rdma is
instantiated. Instead of checking sc_devcap during every incoming
RPC, set the reader function once when the connection is accepted.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:46 -05:00
Chuck Lever
e5523bd281 svcrdma: Find rmsgp more reliably
xdr_start() can return the wrong rmsgp address if an assumption
about how the xdr_buf was constructed changes.  When it gets it
wrong, the client receives a reply that has gibberish in the
RPC/RDMA header, preventing it from matching a waiting RPC request.

Instead, make (and document) just one assumption: that the RDMA
header for the client's RPC call is at the start of the first page
in rq_pages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:45 -05:00
Chuck Lever
3fe04ee9f9 svcrdma: Scrub BUG_ON() and WARN_ON() call sites
Current convention is to avoid using BUG_ON() in places where an
oops could cause complete system failure.

Replace BUG_ON() call sites in svcrdma with an assertion error
message and allow execution to continue safely.

Some BUG_ON() calls are removed because they have never fired in
production (that we are aware of).

Some WARN_ON() calls are also replaced where a back trace is not
helpful; e.g., in a workqueue task.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:45 -05:00
Chuck Lever
2397aa8b51 svcrdma: Clean up read chunk counting
The byte_count argument is not used, and the function is called
only from one place.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:44 -05:00
Chuck Lever
83f2bedfc6 svcrdma: Remove unused variable
Nit: remove an unused variable to squelch a compiler warning.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:43 -05:00
Chuck Lever
597561bf6a svcrdma: Clean up dprintk
Nit: Fix inconsistent white space in dprintk messages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-15 15:01:43 -05:00
Marcel Holtmann
2b8df32395 Bluetooth: Add paranoid check for existing LE and BR/EDR SMP channels
When the SMP channels have been already registered, then print out a
clear WARN_ON message that something went wrong. Also unregister the
existing channels in this case before trying to register new ones.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-15 21:59:38 +02:00
Nicolas Dichtel
7eb35b1483 socket: use iov_length()
Better to use available helpers.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 13:56:15 -05:00
Johan Hedberg
327a71910c Bluetooth: Fix lookup of fixed channels by local bdaddr
The comparing of chan->src should always be done against the local
identity address, represented by hcon->src and hcon->src_type. This
patch modifies l2cap_global_fixed_chan() to take the full hci_conn so
that we can easily compare against hcon->src and hcon->src_type.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-15 16:54:14 +01:00
Johan Hedberg
a250e048a7 Bluetooth: Add helpers for src/dst bdaddr type conversion
The current bdaddr_type() usage in l2cap_core.c is a bit funny in that
it's always passed a hci_conn + a hci_conn member. Because of this only
the hci_conn is really needed. Since the second parameter is always
either hcon->src_type or hcon->dst type this patch adds two helper
functions for each purpose: bdaddr_src_type() and bdaddr_dst_type().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-15 16:54:14 +01:00
Johannes Berg
97d910d0aa cfg80211: remove 80+80 MHz rate reporting
These rates are treated the same as 160 MHz in the spec, so
it makes no sense to distinguish them. As no driver uses them
yet, this is also not a problem, just remove them.

In the userspace API the field remains reserved to preserve
API and ABI.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-15 16:05:21 +01:00
Johannes Berg
f89903d53f mac80211: remove 80+80 MHz rate reporting
These rates are treated the same as 160 MHz in the spec,
so it makes no sense to distinguish them. As no driver
uses them yet, this is also not a problem, just remove
them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-15 16:02:46 +01:00
Marcel Holtmann
162a3bac8d Bluetooth: Bind the SMP channel registration to management power state
When the controller gets powered on via the management interface, then
register the supported SMP channels. There is no point in registering
these channels earlier since it is not know what identity address the
controller is going to operate with.

When powering down a controller unregister all SMP channels. This is
required since a powered down controller is allowed to change its
identity address.

In addition the SMP channels are only available when the controller
is powered via the management interface. When using legacy ioctl, then
Bluetooth Low Energy is not supported and registering kernel side SMP
integration may actually cause confusion.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-15 12:54:31 +02:00
Marcel Holtmann
7e7ec44564 Bluetooth: Don't register any SMP channel if LE is not supported
When LE features are not supported, then do not bother registering any
kind of SMP channel.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-15 12:54:30 +02:00
Marcel Holtmann
157029ba30 Bluetooth: Fix LE SMP channel source address and source address type
The source address and source address type of the LE SMP channel can
either be the public address of the controller or the static random
address configured by the host.

Right now the public address is used for the LE SMP channel and
obviously that is not correct if the controller operates with the
configured static random address.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-15 12:54:30 +02:00
Marcel Holtmann
111e4bccd1 Bluetooth: Fix issue with switching BR/EDR back on when disabled
For dual-mode controllers it is possible to disable BR/EDR and operate
as LE single mode controllers with a static random address. If that is
the case, then refuse switching BR/EDR back on after the controller has
been powered.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-15 10:27:47 +02:00
Marcel Holtmann
eeb5a067d1 Bluetooth: Show device address type for L2CAP debugfs entries
The devices address types are BR/EDR Public, LE Public and LE Random and
any of these three is valid for L2CAP connections. So show the correct
type in the debugfs list.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-15 10:23:47 +02:00
David S. Miller
4e7a84b1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter updates for net-next

The following patchset contains netfilter updates for net-next, just a
bunch of cleanups and small enhancement to selectively flush conntracks
in ctnetlink, more specifically the patches are:

1) Rise default number of buckets in conntrack from 16384 to 65536 in
   systems with >= 4GBytes, patch from Marcelo Leitner.

2) Small refactor to save one level on indentation in xt_osf, from
   Joe Perches.

3) Remove unnecessary sizeof(char) in nf_log, from Fabian Frederick.

4) Another small cleanup to remove redundant variable in nfnetlink,
   from Duan Jiong.

5) Fix compilation warning in nfnetlink_cthelper on parisc, from
   Chen Gang.

6) Fix wrong format in debugging for ctseqadj, from Gao feng.

7) Selective conntrack flushing through the mark for ctnetlink, patch
   from Kristian Evensen.

8) Remove nf_ct_conntrack_flush_report() exported symbol now that is
   not required anymore after the selective flushing patch, again from
   Kristian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:50:25 -05:00
Thomas Graf
1dd144cf5b openvswitch: Support VXLAN Group Policy extension
Introduces support for the group policy extension to the VXLAN virtual
port. The extension is disabled by default and only enabled if the user
has provided the respective configuration.

  ovs-vsctl add-port br0 vxlan0 -- \
     set Interface vxlan0 type=vxlan options:exts=gbp

The configuration interface to enable the extension is based on a new
attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
which can carry additional extensions as needed in the future.

The group policy metadata is stored as binary blob (struct ovs_vxlan_opts)
internally just like Geneve options but transported as nested Netlink
attributes to user space.

Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the
binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced.

The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing
OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:11:41 -05:00
Thomas Graf
81bfe3c3cf openvswitch: Allow for any level of nesting in flow attributes
nlattr_set() is currently hardcoded to two levels of nesting. This change
introduces struct ovs_len_tbl to define minimal length requirements plus
next level nesting tables to traverse the key attributes to arbitrary depth.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:11:41 -05:00
Thomas Graf
d91641d9b5 openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
Also factors out Geneve validation code into a new separate function
validate_and_copy_geneve_opts().

A subsequent patch will introduce VXLAN options. Rename the existing
GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
tunnel metadata options.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:11:41 -05:00
Thomas Graf
3511494ce2 vxlan: Group Policy extension
Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.

SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels.  However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.

           Host 1:                       Host 2:

      Group A        Group B        Group B     Group A
      +-----+   +-------------+    +-------+   +-----+
      | lxc |   | SELinux CTX |    | httpd |   | VM  |
      +--+--+   +--+----------+    +---+---+   +--+--+
	  \---+---/                     \----+---/
	      |                              |
	  +---+---+                      +---+---+
	  | vxlan |                      | vxlan |
	  +---+---+                      +---+---+
	      +------------------------------+

Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP  frames. The extension is therefore disabled by default
and needs to be specifically enabled:

   ip link add [...] type vxlan [...] gbp

In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.

Examples:
 iptables:
  host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
  host2# iptables -I INPUT -m mark --mark 0x200 -j DROP

 OVS:
  # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
  # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 01:11:41 -05:00
David S. Miller
3f3558bb51 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/xen-netfront.c

Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 00:53:17 -05:00
Linus Torvalds
a6391a924c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Don't use uninitialized data in IPVS, from Dan Carpenter.

 2) conntrack race fixes from Pablo Neira Ayuso.

 3) Fix TX hangs with i40e, from Jesse Brandeburg.

 4) Fix budget return from poll calls in dnet and alx, from Eric
    Dumazet.

 5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph
    Jaeger.

 6) Fix bug introduced by conversion to list_head in TIPC retransmit
    code, from Jon Paul Maloy.

 7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey
    Khoroshilov.

 8) Fix bridge build with INET disabled, from Arnd Bergmann.

 9) Fix netlink array overrun for PROBE attributes in openvswitch, from
    Thomas Graf.

10) Don't hold spinlock across synchronize_irq() in tg3 driver, from
    Prashant Sreedharan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  tg3: Release tp->lock before invoking synchronize_irq()
  tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
  tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
  team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
  openvswitch: packet messages need their own probe attribtue
  i40e: adds FCoE configure option
  cxgb4vf: Fix queue allocation for 40G adapter
  netdevice: Add missing parentheses in macro
  bridge: only provide proxy ARP when CONFIG_INET is enabled
  neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
  net: fec: fix MDIO bus assignement for dual fec SoC's
  xen-netfront: use different locks for Rx and Tx stats
  drivers: net: cpsw: fix multicast flush in dual emac mode
  cxgb4vf: Initialize mdio_addr before using it
  net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
  usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
  MAINTAINERS: add me as ibmveth maintainer
  tipc: fix bug in broadcast retransmit code
  update ip-sysctl.txt documentation (v2)
  net/at91_ether: prepare and unprepare clock
  ...
2015-01-15 11:17:37 +13:00
Thomas Graf
1ba398041f openvswitch: packet messages need their own probe attribtue
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.

Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.

Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tracked-down-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:49:44 -05:00
Jukka Rissanen
7b2ed60ed4 Bluetooth: 6lowpan: Remove PSM setting code
Removing PSM setting debugfs interface as the IPSP has a well
defined PSM value that should be used.

The patch introduces enable flag that can be used to toggle
6lowpan on/off.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-14 22:48:13 +01:00
Johan Hedberg
e12af489b9 Bluetooth: Fix valid Identity Address check
According to the Bluetooth core specification valid identity addresses
are either Public Device Addresses or Static Random Addresses. IRKs
received with any other type of address should be discarded since we
cannot assume to know the permanent identity of the peer device.

This patch fixes a missing check for the Identity Address when receiving
the Identity Address Information SMP PDU.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.17+
2015-01-14 22:48:06 +01:00
zhuyj
9a6b4b392d ipv6:icmp:remove unnecessary brackets
There are too many brackets. Maybe only one bracket is enough.

Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:35:49 -05:00
Fan Du
3f4c1d87af openvswitch: Introduce ovs_tunnel_route_lookup
Introduce ovs_tunnel_route_lookup to consolidate route lookup
shared by vxlan, gre, and geneve ports.

Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 16:32:06 -05:00
Tom Herbert
a2b12f3c7a udp: pass udp_offload struct to UDP gro callbacks
This patch introduces udp_offload_callbacks which has the same
GRO functions (but not a GSO function) as offload_callbacks,
except there is an argument to a udp_offload struct passed to
gro_receive and gro_complete functions. This additional argument
can be used to retrieve the per port structure of the encapsulation
for use in gro processing (mostly by doing container_of on the
structure).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 15:20:04 -05:00
Arnd Bergmann
d92cfdbbea bridge: only provide proxy ARP when CONFIG_INET is enabled
When IPV4 support is disabled, we cannot call arp_send from
the bridge code, which would result in a kernel link error:

net/built-in.o: In function `br_handle_frame_finish':
:(.text+0x59914): undefined reference to `arp_send'
:(.text+0x59a50): undefined reference to `arp_tbl'

This makes the newly added proxy ARP support in the bridge
code depend on the CONFIG_INET symbol and lets the compiler
optimize the code out to avoid the link error.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 958501163d ("bridge: Add support for IEEE 802.11 Proxy ARP")
Cc: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 15:08:02 -05:00
Gowtham Anandha Babu
36c269cecf Bluetooth: Remove dead code
Variable 'controller' is assigned a value that is never used.
Identified by cppcheck tool.

Signed-off-by: Gowtham Anandha Babu <gowtham.ab@samsung.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-14 11:16:17 +02:00
Luciano Coelho
75453ccb61 nl80211: send netdetect configuration info in NL80211_CMD_GET_WOWLAN
Send the netdetect configuration information in the response to
NL8021_CMD_GET_WOWLAN commands.  This includes the scan interval,
SSIDs to match and frequencies to scan.

Additionally, add the NL80211_WOWLAN_TRIG_NET_DETECT with
NL80211_ATTR_WOWLAN_TRIGGERS_SUPPORTED.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-14 09:45:17 +01:00
Arik Nemtsov
ef51fb1d1c cfg80211: avoid reg-hints in self-managed only systems
When a system contains only self-managed regulatory devices all hints
from the regulatory core are ignored. Stop hint processing early in this
case. These systems usually don't have CRDA deployed, which results in
endless (irrelevent) logs of the form:
cfg80211: Calling CRDA to update world regulatory domain

Make sure there's at least one self-managed device before discarding a
hint, in order to prevent initial hints from disappearing on CRDA
managed systems.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-14 09:43:44 +01:00
Arik Nemtsov
2c3e861c94 cfg80211: introduce sync regdom set API for self-managed
A self-managed device will sometimes need to set its regdomain synchronously.
Notably it should be set before usermode has a chance to query it. Expose
a new API to accomplish this which requires the RTNL.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-14 09:43:44 +01:00
Eliad Peller
2726f23d2d mac80211: don't defer scans in case of radar detection
Radar detection can last indefinite time. There is no
point in deferring a scan request in this case - simply
return -EBUSY.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-14 09:37:07 +01:00
Eliad Peller
e7f2337ae7 mac80211: consider only relevant vifs for radar_required calculation
ctx->conf.radar_enabled should reflect whether radar
detection is enabled for the channel context.

When calculating it, make it consider only the vifs
that have this context assigned (instead of all the
vifs).

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-14 09:37:06 +01:00
Eliad Peller
5cbc95a749 mac80211: remove local->radar_detect_enabled
local->radar_detect_enabled should tell whether
radar_detect is enabled on any interface belonging
to local.

However, it's not getting updated correctly
in many cases (actually, when testing with hwsim
it's never been set, even when the dfs master
is beaconing).

Instead of handling all the corner cases
(e.g. channel switch), simply check whether
radar detection is enabled only when needed,
instead of caching the result.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-14 09:37:06 +01:00
Arik Nemtsov
50075892ba mac80211: add TDLS supported channels correctly
The function adding the supported channels IE during a TDLS connection had
several issues:
1. If the entire subband is usable, the function exitted the loop without
   adding it
2. The function only checked chandef_usable, ignoring flags like RADAR
   which would prevent TDLS off-channel communcation.
3. HT20 was explicitly required in the chandef, while not a requirement
   for TDLS off-channel.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-14 09:34:33 +01:00
Emmanuel Grumbach
3b24f4c653 mac80211: let flush() drop packets when possible
When roaming / suspending, it makes no sense to wait until
the transmit queues of the device are empty. In extreme
condition they can be starved (VO saturating the air), but
even in regular cases, it is pointless to delay the roaming
because the low level driver is trying to send packets to
an AP which is far away. We'd rather drop these packets and
let TCP retransmit if needed. This will allow to speed up
the roaming.

For suspend, the explanation is even more trivial.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-14 09:31:18 +01:00
Marcel Holtmann
5ced24644b Bluetooth: Use %llu for printing duration details of selftests
The duration variable for the selftests is unsigned long long and with
that use %llu instead of %lld when printing the results.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-14 10:02:45 +02:00
Marcel Holtmann
36f260ceff Bluetooth: Move Delete Stored Link Key to 4th phase of initialization
This moves the execution of Delete Stored Link Key command to the
hci_init4_req phase. No actual code has been changed. The command
is just executed at a later stage of the initialization.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-14 10:02:21 +02:00
Jean-Francois Remy
4bf6980dd0 neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
When setting base_reachable_time or base_reachable_time_ms on a
specific interface through sysctl or netlink, the reachable_time
value is not updated.

This means that neighbour entries will continue to be updated using the
old value until it is recomputed in neigh_period_work (which
    recomputes the value every 300*HZ).
On systems with HZ equal to 1000 for instance, it means 5mins before
the change is effective.

This patch changes this behavior by recomputing reachable_time after
each set on base_reachable_time or base_reachable_time_ms.
The new value will become effective the next time the neighbour's timer
is triggered.

Changes are made in two places: the netlink code for set and the sysctl
handling code. For sysctl, I use a proc_handler. The ipv6 network
code does provide its own handler but it already refreshes
reachable_time correctly so it's not an issue.
Any other user of neighbour which provide its own handlers must
refresh reachable_time.

Signed-off-by: Jean-Francois Remy <jeff@melix.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:28:00 -05:00
Jiri Pirko
df8a39defa net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:51:08 -05:00
Jiri Pirko
d8b9605d26 net: sched: fix skb->protocol use in case of accelerated vlan path
tc code implicitly considers skb->protocol even in case of accelerated
vlan paths and expects vlan protocol type here. However, on rx path,
if the vlan header was already stripped, skb->protocol contains value
of next header. Similar situation is on tx path.

So for skbs that use skb->vlan_tci for tagging, use skb->vlan_proto instead.

Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:51:08 -05:00
Sasha Levin
357c4774b5 tipc: correctly handle releasing a not fully initialized sock
Commit f2f9800d49 "tipc: make tipc node table aware of net
namespace" has added a dereference of sock->sk before making sure it's
not NULL, which makes releasing a tipc socket NULL pointer dereference
for sockets that are not fully initialized.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:26:27 -05:00
Ying Xue
3721e9c7c1 tipc: remove redundant timer defined in tipc_sock struct
Remove the redundant timer defined in tipc_sock structure, instead we
can directly reuse the sk_timer defined in sock structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 16:45:55 -05:00
Roopa Prabhu
0fe6de4903 bridge: fix uninitialized variable warning
net/bridge/br_netlink.c: In function ‘br_fill_ifinfo’:
net/bridge/br_netlink.c:146:32: warning: ‘vid_range_flags’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  err = br_fill_ifvlaninfo_range(skb, vid_range_start,
                                ^
net/bridge/br_netlink.c:108:6: note: ‘vid_range_flags’ was declared here
  u16 vid_range_flags;

Reported-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 16:39:36 -05:00
Syam Sidhardhan
a440edf1fc openvswitch: Remove unnecessary version.h inclusion
version.h inclusion is not necessary as detected by versioncheck.

Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 14:31:41 -05:00
Sébastien Barré
08abdffa1c tcp: avoid reducing cwnd when ACK+DSACK is received
With TLP, the peer may reply to a probe with an
ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
such ACK+DSACK will be missed and only at next, higher ack will the TLP
episode be considered done. Since the DSACK is not present anymore,
this will cost a cwnd reduction.

This patch ensures that this scenario does not cause a cwnd reduction, since
receiving an ACK+DSACK indicates that both the initial segment and the probe
have been received by the peer.

The following packetdrill test, from Neal Cardwell, validates this patch:

// Establish a connection.
0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0     setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+0    < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.020 < . 1:1(0) ack 1 win 257
+0    accept(3, ..., ...) = 4

// Send 1 packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1:1001(1000) ack 1

// Loss probe retransmission.
// packets_out == 1 => schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
// In this case, this means: 1.5*RTT + 200ms = 230ms
+.230 > P. 1:1001(1000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Receiver ACKs at tlp_high_seq with a DSACK,
// indicating they received the original packet and probe.
+.020 < . 1:1(0) ack 1001 win 257 <sack 1:1001,nop,nop>
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Send another packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1001:2001(1000) ack 1

// Receiver ACKs above tlp_high_seq, which should end the TLP episode
// if we haven't already. We should not reduce cwnd.
+.020 < . 1:1(0) ack 2001 win 257
+0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

Credits:
-Gregory helped in finding that tcp_process_tlp_ack was where the cwnd
got reduced in our MPTCP tests.
-Neal wrote the packetdrill test above
-Yuchung reworked the patch to make it more readable.

Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Nandita Dukkipati <nanditad@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 14:22:02 -05:00
Ying Xue
c5adde9468 netlink: eliminate nl_sk_hash_lock
As rhashtable_lookup_compare_insert() can guarantee the process
of search and insertion is atomic, it's safe to eliminate the
nl_sk_hash_lock. After this, object insertion or removal will
be protected with per bucket lock on write side while object
lookup is guarded with rcu read lock on read side.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 14:01:00 -05:00
Pankaj Gupta
1059590254 net: allow large number of rx queues
netif_alloc_rx_queues() uses kcalloc() to allocate memory
for "struct netdev_queue *_rx" array.
If we are doing large rx queue allocation kcalloc() might
fail, so this patch does a fallback to vzalloc().
Similar implementation is done for tx queue allocation in
netif_alloc_netdev_queues().

We avoid failure of high order memory allocation
with the help of vzalloc(), this allows us to do large
rx and tx queue allocation which in turn helps us to
increase the number of queues in tun.

As vmalloc() adds overhead on a critical network path,
__GFP_REPEAT flag is used with kzalloc() to do this fallback
only when really needed.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 17:05:05 -05:00
Rickard Strandqvist
ddcde70cbf net: sched: sch_teql: Remove unused function
Remove the function teql_neigh_release() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:50:46 -05:00
Rickard Strandqvist
83400b990c net: xfrm: xfrm_algo: Remove unused function
Remove the function aead_entries() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:50:46 -05:00
Roopa Prabhu
36cd0ffbab bridge: new function to pack vlans into ranges during gets
This patch adds new function to pack vlans into ranges
whereever applicable using the flags BRIDGE_VLAN_INFO_RANGE_BEGIN
and BRIDGE VLAN_INFO_RANGE_END

Old vlan packing code is moved to a new function and continues to be
called when filter_mask is RTEXT_FILTER_BRVLAN.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:47:05 -05:00
Roopa Prabhu
bdced7ef78 bridge: support for multiple vlans and vlan ranges in setlink and dellink requests
This patch changes bridge IFLA_AF_SPEC netlink attribute parser to
look for more than one IFLA_BRIDGE_VLAN_INFO attribute. This allows
userspace to pack more than one vlan in the setlink msg.

The dumps were already sending more than one vlan info in the getlink msg.

This patch also adds bridge_vlan_info flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
BRIDGE_VLAN_INFO_RANGE_END to indicate start and end of vlan range

This patch also deletes unused ifla_br_policy.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:47:04 -05:00
Ying Xue
d49e204161 tipc: make netlink support net namespace
Currently tipc module only allows users sitting on "init_net" namespace
to configure it through netlink interface. But now almost each tipc
component is able to be aware of net namespace, so it's time to open
the permission for users residing in other namespaces, allowing them
to configure their own tipc stack instance through netlink interface.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:34 -05:00
Ying Xue
bafa29e341 tipc: make tipc random value aware of net namespace
After namespace is supported, each namespace should own its private
random value. So the global variable representing the random value
must be moved to tipc_net structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
a62fbccecd tipc: make subscriber server support net namespace
TIPC establishes one subscriber server which allows users to subscribe
their interesting name service status. After tipc supports namespace,
one dedicated tipc stack instance is created for each namespace, and
each instance can be deemed as one independent TIPC node. As a result,
subscriber server must be built for each namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
3474753954 tipc: make tipc node address support net namespace
If net namespace is supported in tipc, each namespace will be treated
as a separate tipc node. Therefore, every namespace must own its
private tipc node address. This means the "tipc_own_addr" global
variable of node address must be moved to tipc_net structure to
satisfy the requirement. It's turned out that users also can assign
node address for every namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
4ac1c8d0ee tipc: name tipc name table support net namespace
TIPC name table is used to store the mapping relationship between
TIPC service name and socket port ID. When tipc supports namespace,
it allows users to publish service names only owned by a certain
namespace. Therefore, every namespace must have its private name
table to prevent service names published to one namespace from being
contaminated by other service names in another namespace. Therefore,
The name table global variable (ie, nametbl) and its lock must be
moved to tipc_net structure, and a parameter of namespace must be
added for necessary functions so that they can obtain name table
variable defined in tipc_net structure.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
e05b31f4bf tipc: make tipc socket support net namespace
Now tipc socket table is statically allocated as a global variable.
Through it, we can look up one socket instance with port ID, insert
a new socket instance to the table, and delete a socket from the
table. But when tipc supports net namespace, each namespace must own
its specific socket table. So the global variable of socket table
must be redefined in tipc_net structure. As a concequence, a new
socket table will be allocated when a new namespace is created, and
a socket table will be deallocated when namespace is destroyed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
1da465683a tipc: make tipc broadcast link support net namespace
TIPC broadcast link is statically established and its relevant states
are maintained with the global variables: "bcbearer", "bclink" and
"bcl". Allowing different namespace to own different broadcast link
instances, these variables must be moved to tipc_net structure and
broadcast link instances would be allocated and initialized when
namespace is created.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:33 -05:00
Ying Xue
7f9f95d9d9 tipc: make bearer list support net namespace
Bearer list defined as a global variable is used to store bearer
instances. When tipc supports net namespace, bearers created in
one namespace must be isolated with others allocated in other
namespaces, which requires us that the bearer list(bearer_list)
must be moved to tipc_net structure. As a result, a net namespace
pointer has to be passed to functions which access the bearer list.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00
Ying Xue
f2f9800d49 tipc: make tipc node table aware of net namespace
Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)

To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:24:32 -05:00