This change corrects an issue where if inflate or halve fails we were
exiting the resize function without at least updating the slen for the
node. To correct this I have moved the update of max_size into the while
loop so that it is only decremented on a successful call to either inflate
or halve.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses two issues.
The first issue is the fact that I believe I had the RCU freeing sequence
slightly out of order. As a result we could get into an issue if a caller
went into a child of a child of the new node, then backtraced into the to be
freed parent, and then attempted to access a child of a child that may have
been consumed in a resize of one of the new nodes children. To resolve this I
have moved the resize after we have freed the oldtnode. The only side effect
of this is that we will now be calling resize on more nodes in the case of
inflate due to the fact that we don't have a good way to test to see if a
full_tnode on the new node was there before or after the allocation. This
should have minimal impact however since the node should already be
correctly size so it is just the cost of calling should_inflate that we
will be taking on the node which is only a couple of cycles.
The second issue is the fact that inflate and halve were essentially doing
the same thing after the new node was added to the trie replacing the old
one. As such it wasn't really necessary to keep the code in both functions
so I have split it out into two other functions, called replace and
update_children.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In doing performance testing and analysis of the changes I recently found
that by shifting the index I had created an unnecessary dependency.
I have updated the code so that we instead shift a mask by bits and then
just test against that as that should save us about 2 CPU cycles since we
can generate the mask while the key and pos are being processed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the vxlan transmit path there is no need to reference the socket
for a tunnel which is needed for the receive side. We do, however,
need the vxlan_dev flags. This patch eliminate references
to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
vxlan_sock flags.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP tunnel transmit functions udp_tunnel_xmit_skb and
udp_tunnel6_xmit_skb include a socket argument. The socket being
passed to the functions (from VXLAN) is a UDP created for receive
side. The only thing that the socket is used for in the transmit
functions is to get the setting for checksum (enabled or zero).
This patch removes the argument and and adds a nocheck argument
for checksum setting. This eliminates the unnecessary dependency
on a UDP socket for UDP tunnel transmit.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netlink FDB messages are sent in the link netns. The header of these messages
contains the ifindex (ndm_ifindex) of the netdevice, but this ifindex is
unusable in case of x-netns vxlan.
I named the new attribute NDA_NDM_IFINDEX_NETNSID, to avoid confusion with
NDA_IFINDEX.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an error occurs when the netdevice is moved to the link netns, a full cleanup
must be done.
Fixes: 317f4810e4 ("rtnl: allow to create device with IFLA_LINK_NETNSID set")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Depending on NETFILTER is not sufficient to ensure the presence of the
'mark' field in nf_conn, also needs to depend on NF_CONNTRACK_MARK.
Fixes: 22a5dc ("net: sched: Introduce connmark action")
Cc: Felix Fietkau <nbd@openwrt.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case userspace attempts to obtain key information for or delete a
unicast key, this is currently erroneously rejected unless the driver
sets the WIPHY_FLAG_IBSS_RSN flag. Apparently enough drivers do so it
was never noticed.
Fix that, and while at it fix a potential memory leak: the error path
in the get_key() function was placed after allocating a message but
didn't free it - move it to a better place. Luckily admin permissions
are needed to call this operation.
Cc: stable@vger.kernel.org
Fixes: e31b82136d ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix a regression introduced by commit a5e70697d0 ("mac80211: add radiotap flag
and handling for 5/10 MHz") where the IEEE80211_CHAN_CCK channel type flag was
incorrectly replaced by the IEEE80211_CHAN_OFDM flag. This commit fixes that by
using the CCK flag again.
Cc: stable@vger.kernel.org
Fixes: a5e70697d0 ("mac80211: add radiotap flag and handling for 5/10 MHz")
Signed-off-by: Mathy Vanhoef <vanhoefm@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
HT Control field may also be present in management frames, as defined
in 8.2.4.1.10 of 802.11-2012. Account for this in calculation of header
length.
Signed-off-by: Fred Chou <fred.chou.nd@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In normal cases (i.e. when we are fully associated), cfg80211 takes
care of removing all the stations before calling suspend in mac80211.
But in the corner case when we suspend during authentication or
association, mac80211 needs to roll back the station states. But we
shouldn't roll back the station states in the suspend function,
because this is taken care of in other parts of the code, except for
WDS interfaces. For AP types of interfaces, cfg80211 takes care of
disconnecting all stations before calling the driver's suspend code.
For station interfaces, this is done in the quiesce code.
For WDS interfaces we still need to do it here, so move the code into
a new switch case for WDS.
Cc: stable@kernel.org [3.15+]
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* fix network-manager which was broken by the previous changes
* fix delete-station events, which were broken by me making the
genlmsg_end() mistake
* fix a timer left running during suspend in some race conditions
that would cause an annoying (but harmless) warning
* (less important, but in the tree already) remove 80+80 MHz rate
reporting since the spec doesn't distinguish it from 160 MHz;
as the bitrate they're both 160 MHz bandwidth
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUvUZlAAoJEDBSmw7B7bqrfNMQAJT5jjOSjmwW8Zdvujvda/qt
bFpYa9t0NsN3izzMpjPSrCwPrHN5qE86ZA8TcZrIzejPH4rpltjaXB6JNHZardVo
deCUWU9xotoPELoE0Xex9mHPEkYYvOaht/P8A/88qP1S2PykMmj9fqNeijyUvwuo
Jlsh0wKe4Jq6bCmdxvy/bde84ceAQcuh2TKNov1S0tB38tRY9qSu2n6ZGpoMNcEe
CWuW+23jL1uAvt6Ljk2fTKdLR8iyXykfM0UMX2/4R2PMnJXK/dQqV/eeXTjpxtMk
UV4aIMcSS19D7HxICKOXOdZLdMMuXaFUnUMlGLBtXZz3n9lZoP7RtVIHoib8VBXZ
tY7xQrK6YNvwZ4SZZPuv/yr6YWP2+vBM2FUfXjzD+or6uYsej203a5q0RsOY+3Tp
6Yklr+mQNlrOtpMsHMSgrBUUZsAK1I95ALMVVqaq1hgf1cDvRIDHlOo4A7bjwNFw
eA3L1o4O1/E/IGp4v6+2Iukc9rIwm11sNr/wuD8dDkZTradUPH1iY6J5sxJNb2Nq
YdpCnQ/lNXj650+z9/G2omSA6DTTTOtXJPxKR+FOHZVKDpZYtF6TxKb0S79fINps
6ZlWIna5bUiXF1b6xad1x+vtyjNMgTvkg6mR+WQnvF57Ri8hucbtpv5wpA5bhYUQ
Fbz9VZF2nfMeIbXfTaWi
=Bvmr
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2015-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Some further updates for net-next:
* fix network-manager which was broken by the previous changes
* fix delete-station events, which were broken by me making the
genlmsg_end() mistake
* fix a timer left running during suspend in some race conditions
that would cause an annoying (but harmless) warning
* (less important, but in the tree already) remove 80+80 MHz rate
reporting since the spec doesn't distinguish it from 160 MHz;
as the bitrate they're both 160 MHz bandwidth
Signed-off-by: David S. Miller <davem@davemloft.net>
My previous patch to this file changed the code to be bug-compatible
towards userspace. Unless userspace (which I wasn't able to find)
implements the dump reader by hand in a wrong way, this isn't needed.
If it uses libnl or similar code putting multiple messages into a
single SKB is far more efficient.
Change the code to do this. While at it, also clean it up and don't
use so many variables - just store the address in the callback args
directly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This tc action allows you to retrieve the connection tracking mark
This action has been used heavily by openwrt for a few years now.
There are known limitations currently:
doesn't work for initial packets, since we only query the ct table.
Fine given use case is for returning packets
no implicit defrag.
frags should be rare so fix later..
won't work for more complex tasks, e.g. lookup of other extensions
since we have no means to store results
we still have a 2nd lookup later on via normal conntrack path.
This shouldn't break anything though since skb->nfct isn't altered.
V2:
remove unnecessary braces (Jiri)
change the action identifier to 14 (Jiri)
Fix some stylistic issues caught by checkpatch
V3:
Move module params to bottom (Cong)
Get rid of tcf_hashinfo_init and friends and conform to newer API (Cong)
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA-enabled master network devices with a switch tagging protocol should
strip the protocol specific format before handing the frame over to
higher layer.
When adding such a DSA master network device as a bridge member, we go
through the following code path when receiving a frame:
__netif_receive_skb_core
-> first ptype check against ptype_all is not returning any
handler for this skb
-> check and invoke rx_handler:
-> deliver frame to the bridge layer: br_handle_frame
DSA registers a ptype handler with the fake ETH_XDSA ethertype, which is
called *after* the bridge-layer rx_handler has run. br_handle_frame()
tries to parse the frame it received from the DSA master network device,
and will not be able to match any of its conditions and jumps straight
at the end of the end of br_handle_frame() and returns
RX_HANDLER_CONSUMED there.
Since we returned RX_HANDLER_CONSUMED, __netif_receive_skb_core() stops
RX processing for this frame and returns NET_RX_SUCCESS, so we never get
a chance to call our switch tag packet processing logic and deliver
frames to the DSA slave network devices, and so we do not get any
functional bridge members at all.
Instead of cluttering the bridge receive path with DSA-specific checks,
and rely on assumptions about how __netif_receive_skb_core() is
processing frames, we simply deny adding the DSA master network device
(conduit interface) as a bridge member, leaving only the slave DSA
network devices to be bridge members, since those will work correctly in
all circumstances.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic to configure a network interface for kernel IP
auto-configuration is very simplistic, and does not handle the case
where a device is stacked onto another such as with DSA. This causes the
kernel not to open and configure the master network device in a DSA
switch tree, and therefore slave network devices using this master
network devices as conduit device cannot be open.
This restriction comes from a check in net/dsa/slave.c, which is
basically checking the master netdev flags for IFF_UP and returns
-ENETDOWN if it is not the case.
Automatically bringing-up DSA master network devices allows DSA slave
network devices to be used as valid interfaces for e.g: NFS root booting
by allowing kernel IP autoconfiguration to succeed on these interfaces.
On the reverse path, make sure we do not attempt to close a DSA-enabled
device as this would implicitely prevent the slave DSA network device
from operating.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the attack vector and stop generating IPv6 Fragment Header for
paths with an MTU smaller than the minimum required IPv6 MTU
size (1280 byte) - called atomic fragments.
See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
for more information and how this "feature" can be misused.
[1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00
Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the ability to create a netdevice in a specified netns and
then move it into the final netns. In fact, it allows to have a symetry between
get and set rtnl messages.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels).
With this attribute, it's possible to interpret correctly all advertised
information (like IFLA_LINK, etc.).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to the netns where it is added (ie valid only into this
netns).
The main function (ie the one exported to other module), peernet2id(), allows to
get the id of a peer netns. If no id has been assigned by the user, this
function allocates one.
These ids will be used in netlink messages to point to a peer netns, for example
in case of a x-netns interface.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While suspending, we destroy the authentication /
association that might be taking place. While doing so, we
forgot to delete the timer which can be firing after
local->suspended is already set, producing the warning below.
Fix that by deleting the timer.
[66722.825487] WARNING: CPU: 2 PID: 5612 at net/mac80211/util.c:755 ieee80211_can_queue_work.isra.18+0x32/0x40 [mac80211]()
[66722.825487] queueing ieee80211 work while going to suspend
[66722.825529] CPU: 2 PID: 5612 Comm: kworker/u16:69 Tainted: G W O 3.16.1+ #24
[66722.825537] Workqueue: events_unbound async_run_entry_fn
[66722.825545] Call Trace:
[66722.825552] <IRQ> [<ffffffff817edbb2>] dump_stack+0x4d/0x66
[66722.825556] [<ffffffff81075cad>] warn_slowpath_common+0x7d/0xa0
[66722.825572] [<ffffffffa06b5b90>] ? ieee80211_sta_bcn_mon_timer+0x50/0x50 [mac80211]
[66722.825573] [<ffffffff81075d1c>] warn_slowpath_fmt+0x4c/0x50
[66722.825586] [<ffffffffa06977a2>] ieee80211_can_queue_work.isra.18+0x32/0x40 [mac80211]
[66722.825598] [<ffffffffa06977d5>] ieee80211_queue_work+0x25/0x50 [mac80211]
[66722.825611] [<ffffffffa06b5bac>] ieee80211_sta_timer+0x1c/0x20 [mac80211]
[66722.825614] [<ffffffff8108655a>] call_timer_fn+0x8a/0x300
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This reverts commit ba1debdfed.
Oliver reported that it breaks network-manager, for some reason with
this patch NM decides that the device isn't wireless but "generic"
(ethernet), sees no carrier (as expected with wifi) and fails to do
anything else with it.
Revert this to unbreak userspace.
Reported-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 02dba4388d ("bridge: fix setlink/dellink notifications") removed usage of oflags in
both rtnl_bridge_setlink() and rtnl_bridge_dellink() methods. This patch removes this variable as it is no
longer needed.
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 053c095a82 ("netlink: make nlmsg_end() and genlmsg_end()
void") didn't catch all of the cases where callers were breaking out
on the return value being equal to zero, which they no longer should
when zero means success.
Fix all such cases.
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Reported-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.
This makes the very common pattern of
if (genlmsg_end(...) < 0) { ... }
be a whole bunch of dead code. Many places also simply do
return nlmsg_end(...);
and the caller is expected to deal with it.
This also commonly (at least for me) causes errors, because it is very
common to write
if (my_function(...))
/* error condition */
and if my_function() does "return nlmsg_end()" this is of course wrong.
Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.
Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did
- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;
I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.
One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 07f6c4bc (tipc: convert tipc reference table to use generic
rhashtable) introduced a problem with port listing in the new netlink
API. It broke the resume functionality resulting in a never ending
loop. This was caused by starting with the first hash table every time
subsequently never returning an empty skb (terminating).
This patch fixes the resume mechanism by keeping a logical reference
to the last hash table along with a logical reference to the socket
(port) that didn't fit in the previous message.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-01-16
Here are some more bluetooth & ieee802154 patches intended for 3.20:
- Refactoring & cleanups of ieee802154 & 6lowpan code
- Various fixes to the btmrvl driver
- Fixes for Bluetooth Low Energy Privacy feature handling
- Added build-time sanity checks for sockaddr sizes
- Fixes for Security Manager registration on LE-only controllers
- Refactoring of broken inquiry mode handling to a generic quirk
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch benefits from newly introduced switchdev notifier and uses it
to propagate fdb learn events from rocker driver to bridge. That avoids
direct function calls and possible use by other listeners (ovs).
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces new notifier for purposes of exposing events which happen
on switch driver side. The consumers of the event messages are mainly involved
masters, namely bridge and ovs.
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This field already contains the length of the iovec, no need to calculate it
again.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I.e. one-to-many sockets in SCTP are not required to explicitly
call into connect(2) or sctp_connectx(2) prior to data exchange.
Instead, they can directly invoke sendmsg(2) and the SCTP stack
will automatically trigger connection establishment through 4WHS
via sctp_primitive_ASSOCIATE(). However, this in its current
implementation is racy: INIT is being sent out immediately (as
it cannot be bundled anyway) and the rest of the DATA chunks are
queued up for later xmit when connection is established, meaning
sendmsg(2) will return successfully. This behaviour can result
in an undesired side-effect that the kernel made the application
think the data has already been transmitted, although none of it
has actually left the machine, worst case even after close(2)'ing
the socket.
Instead, when the association from client side has been shut down
e.g. first gracefully through SCTP_EOF and then close(2), the
client could afterwards still receive the server's INIT_ACK due
to a connection with higher latency. This INIT_ACK is then considered
out of the blue and hence responded with ABORT as there was no
alive assoc found anymore. This can be easily reproduced f.e.
with sctp_test application from lksctp. One way to fix this race
is to wait for the handshake to actually complete.
The fix defers waiting after sctp_primitive_ASSOCIATE() and
sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
from sctp_sendmsg() have already been placed into the output
queue through the side-effect interpreter, and therefore can then
be bundeled together with COOKIE_ECHO control chunks.
strace from example application (shortened):
socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
close(3) = 0
tcpdump before patch (fooling the application):
22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]
tcpdump after patch:
14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]
Looks like this bug is from the pre-git history museum. ;)
Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was suggested by DaveM to change the name as "len" might indicate
unit bytes.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This action provides a possibility to exec custom BPF code.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
problems with bridge getlink/setlink notifications today:
- bridge setlink generates two notifications to userspace
- one from the bridge driver
- one from rtnetlink.c (rtnl_bridge_notify)
- dellink generates one notification from rtnetlink.c. Which
means bridge setlink and dellink notifications are not
consistent
- Looking at the code it appears,
If both BRIDGE_FLAGS_MASTER and BRIDGE_FLAGS_SELF were set,
the size calculation in rtnl_bridge_notify can be wrong.
Example: if you set both BRIDGE_FLAGS_MASTER and BRIDGE_FLAGS_SELF
in a setlink request to rocker dev, rtnl_bridge_notify will
allocate skb for one set of bridge attributes, but,
both the bridge driver and rocker dev will try to add
attributes resulting in twice the number of attributes
being added to the skb. (rocker dev calls ndo_dflt_bridge_getlink)
There are multiple options:
1) Generate one notification including all attributes from master and self:
But, I don't think it will work, because both master and self may use
the same attributes/policy. Cannot pack the same set of attributes in a
single notification from both master and slave (duplicate attributes).
2) Generate one notification from master and the other notification from
self (This seems to be ideal):
For master: the master driver will send notification (bridge in this
example)
For self: the self driver will send notification (rocker in the above
example. It can use helpers from rtnetlink.c to do so. Like the
ndo_dflt_bridge_getlink api).
This patch implements 2) (leaving the 'rtnl_bridge_notify' around to be used
with 'self').
v1->v2 :
- rtnl_bridge_notify is now called only for self,
so, remove 'BRIDGE_FLAGS_SELF' check and cleanup a few things
- rtnl_bridge_dellink used to always send a RTM_NEWLINK msg
earlier. So, I have changed the notification from br_dellink to
go as RTM_NEWLINK
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In addition to the problem Jeff Layton reported, I looked at the code
and reproduced the same warning by subscribing and removing the genl
family with a socket still open. This is a fairly tricky race which
originates in the fact that generic netlink allows the family to go
away while sockets are still open - unlike regular netlink which has
a module refcount for every open socket so in general this cannot be
triggered.
Trying to resolve this issue by the obvious locking isn't possible as
it will result in deadlocks between unregistration and group unbind
notification (which incidentally lockdep doesn't find due to the home
grown locking in the netlink table.)
To really resolve this, introduce a "closing socket" reference counter
(for generic netlink only, as it's the only affected family) in the
core netlink code and use that in generic netlink to wait for all the
sockets that are being closed at the same time as a generic netlink
family is removed.
This fixes the race that when a socket is closed, it will should call
the unbind, but if the family is removed at the same time the unbind
will not find it, leading to the warning. The real problem though is
that in this case the unbind could actually find a new family that is
registered to have a multicast group with the same ID, and call its
mcast_unbind() leading to confusing.
Also remove the warning since it would still trigger, but is now no
longer a problem.
This also moves the code in af_netlink.c to before unreferencing the
module to avoid having the same problem in the normal non-genl case.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Layton reported that he could trigger the multicast unbind warning
in generic netlink using trinity. I originally thought it was a race
condition between unregistering the generic netlink family and closing
the socket, but there's a far simpler explanation: genetlink currently
allows subscribing to groups that don't (yet) exist, and the warning is
triggered when unsubscribing again while the group still doesn't exist.
Originally, I had a warning in the subscribe case and accepted it out of
userspace API concerns, but the warning was of course wrong and removed
later.
However, I now think that allowing userspace to subscribe to groups that
don't exist is wrong and could possibly become a security problem:
Consider a (new) genetlink family implementing a permission check in
the mcast_bind() function similar to the like the audit code does today;
it would be possible to bypass the permission check by guessing the ID
and subscribing to the group it exists. This is only possible in case a
family like that would be dynamically loaded, but it doesn't seem like a
huge stretch, for example wireless may be loaded when you plug in a USB
device.
To avoid this reject such subscription attempts.
If this ends up causing userspace issues we may need to add a workaround
in af_netlink to deny such requests but not return an error.
Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return value from nl80211_send_station() is the length of the
skb, or a negative error, so abort sending the message only when
the return value was negative.
This fixes the ibss_rsn wpa_supplicant test case.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove the function hci_conn_change_link_key() that is not used anywhere.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The patch c5adde9468 ("netlink:
eliminate nl_sk_hash_lock") introduced a bug where the EADDRINUSE
error has been replaced by ENOMEM. This patch rectifies that
problem.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.
A second problem is that when we transfert the NAPI poll_list from
victim to current cpu, we absolutely need to special case the percpu
backlog, because we do not want to add complex locking to protect
process_queue : Only owner cpu is allowed to manipulate it, unless cpu
is offline.
Based on initial patch from Prasad Sodagudi & Subash Abhinov
Kasiviswanathan.
This version is better because we do not slow down packet processing,
only make migration safer.
Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as
struct {
struct sock_extended_err ee;
struct sockaddr_in(6) offender;
} errhdr;
The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.
An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is just a cleanup, because in the current code MDBA_SET_ENTRY_MAX ==
MDBA_SET_ENTRY.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
one for a deadlock in regulatory processing.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUt7Z6AAoJEDBSmw7B7bqrgxoQALEEWcIJ1wmu+M7ijdiXLiUM
vRNuxGENwIgfdmoTs6R7pgKhEYFzePWccjHOzt9cQB5efdRzDjrxj2fDrPf4o5JB
7of2uHGoaD2RI2H+pJS1URT8igmxDJii+bOEzHn/WL730Hgr2J2iuJizxZ2lzsVM
VKkiwOykV3kfN5MGsj7yvJQXR32DlGfmiT86+3bjNhE8hgU38NgE0TeUUnyF0AS9
jLV5mpJfkLmZyZmnvszV5tiqQQmQAdHImI+vbHuhzNUUAn6RLswxbWBzUrLXpXqu
5KBR2P/6TU4X89NcYGm+JhTI9PghsMbh1zDuqDQ9gq8j0mrV7Kzh1K6LdYoVpfXf
s42gHe32+Mh0l6LRTlsjftMxJbFla7I6madPcVTqJCV2y1LocD1BseJ+qX5bngU1
lBSSbzE9MlAl5gyHVDh1CAV+8FM0CP8Ff3WtAyr8XtDxfAUwmo3xBqmL8pvLq6nh
49kDqDVOzC5KzASYIjqBwmRMcqW2AnaNQG64iIOzM3ure/l5trncPHHPsMkxgwu+
dDgEXwjWhJNaxWt7fcTSZndATLCRvkeb6ZeRoqmY6A2GJgzpUIhm6HETXc9BNGbg
3J56176xx04LYg6U5+vMiU5A+gFjlrUknQ3MGXF0KPgw0MvtSyempobV68Lpul4r
6DviuT9NiRqxloaBimyx
=bMKg
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2015-01-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Just two fixes - one for an uninialized variable and
one for a deadlock in regulatory processing.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have
* a lot of regulatory code changes to deal with the
way newer Intel devices handle this
* a change to drop packets while disconnecting from
an AP instead of trying to wait for them
* a new attempt at improving the tailroom accounting
to not kick in too much for performance reasons
* improvements in wireless link statistics
* many other small improvements and small fixes that
didn't seem necessary for 3.19 (e.g. in hwsim which
is testing only code)
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUt7WEAAoJEDBSmw7B7bqrVBoP/2EViE62HMmXdqG1SZWz8q9o
Iigq8STC/sT2WCx1pYm+tKuVW4LD2O3mCriGNP8A3RwzDZ6H7sKJYb1gV6QCPV6f
4+yT5VSAB3D3lHmp/bbyNsmKCBQ5uS4LVgDrokrkbGpacDu94PYS5Wv9t3x6PBVB
5Xjky6g6A/pSuxTIstSO9k5xkzNjaB1TxvVRz/gJrGcFQVkDFSlVbuTHUVxs8p+p
k6mwY/2WYijZkswWZVQTJLQlF9vRI7PYkKs5m8gz4pjNU48oFJoyu4IP3Z1Xj/Sm
zgT1C9rgp0Du74HYO2niGAvLWgKajAZuW5hIacDndUPjYQQBLgGs/bCJGSntM+x9
XoOdPixdFPT/58ijyYZlmHc8rxPOd2kHsVbwGplp8f195S4VO04D+ejfOaoAUFwX
v/kMvO3XIFmEH1jjkDAC3OTcRMYVMuENyWl7pFzxHIzPeRiEpQUd9iSdM4yol0F2
ZyWvKud4U75Sh+aCiDIIBETtdfCRFe12hgKs4COYbI/UYkGPTPrNei/uisopdubT
JC+7pZOYdSgoX12yVi6ds6DmKE/ZpIQyhIK4wTWgVoszbnfdb9Mw7mJEThwNRjeK
JJPsbuty7u8HWjXzEqHLoTV3BFv1cgRSJc5Wt0zfME+LzD7iQpEpv+QBAguwwChD
Osn55Z3FnKEmBdGcOIje
=vaEW
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2015-01-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Here's a big pile of changes for this round.
We have
* a lot of regulatory code changes to deal with the
way newer Intel devices handle this
* a change to drop packets while disconnecting from
an AP instead of trying to wait for them
* a new attempt at improving the tailroom accounting
to not kick in too much for performance reasons
* improvements in wireless link statistics
* many other small improvements and small fixes that
didn't seem necessary for 3.19 (e.g. in hwsim which
is testing only code)
Conflicts:
drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c
Minor overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
RAW sockets with hdrinc suffer from contention on rt_uncached_lock
spinlock.
One solution is to use percpu lists, since most routes are destroyed
by the cpu that created them.
It is unclear why we even have to put these routes in uncached_list,
as all outgoing packets should be freed when a device is dismantled.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: caacf05e5a ("ipv4: Properly purge netdev references on uncached routes.")
Signed-off-by: David S. Miller <davem@davemloft.net>
For some reason, we made the bandwidth separate flags, which
is rather confusing - a single rate cannot have different
bandwidths at the same time.
Change this to no longer be flags but use a separate field
for the bandwidth ('bw') instead.
While at it, add support for 5 and 10 MHz rates - these are
reported as regular legacy rates with their real bitrate,
but tagged as 5/10 now to make it easier to distinguish them.
In the nl80211 API, the flags are preserved, but the code
now can also clearly only set a single one of the flags.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the SMP channels have been already registered, then print out a
clear WARN_ON message that something went wrong. Also unregister the
existing channels in this case before trying to register new ones.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The comparing of chan->src should always be done against the local
identity address, represented by hcon->src and hcon->src_type. This
patch modifies l2cap_global_fixed_chan() to take the full hci_conn so
that we can easily compare against hcon->src and hcon->src_type.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The current bdaddr_type() usage in l2cap_core.c is a bit funny in that
it's always passed a hci_conn + a hci_conn member. Because of this only
the hci_conn is really needed. Since the second parameter is always
either hcon->src_type or hcon->dst type this patch adds two helper
functions for each purpose: bdaddr_src_type() and bdaddr_dst_type().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
These rates are treated the same as 160 MHz in the spec, so
it makes no sense to distinguish them. As no driver uses them
yet, this is also not a problem, just remove them.
In the userspace API the field remains reserved to preserve
API and ABI.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
These rates are treated the same as 160 MHz in the spec,
so it makes no sense to distinguish them. As no driver
uses them yet, this is also not a problem, just remove
them.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the controller gets powered on via the management interface, then
register the supported SMP channels. There is no point in registering
these channels earlier since it is not know what identity address the
controller is going to operate with.
When powering down a controller unregister all SMP channels. This is
required since a powered down controller is allowed to change its
identity address.
In addition the SMP channels are only available when the controller
is powered via the management interface. When using legacy ioctl, then
Bluetooth Low Energy is not supported and registering kernel side SMP
integration may actually cause confusion.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When LE features are not supported, then do not bother registering any
kind of SMP channel.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The source address and source address type of the LE SMP channel can
either be the public address of the controller or the static random
address configured by the host.
Right now the public address is used for the LE SMP channel and
obviously that is not correct if the controller operates with the
configured static random address.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
For dual-mode controllers it is possible to disable BR/EDR and operate
as LE single mode controllers with a static random address. If that is
the case, then refuse switching BR/EDR back on after the controller has
been powered.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The devices address types are BR/EDR Public, LE Public and LE Random and
any of these three is valid for L2CAP connections. So show the correct
type in the debugfs list.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Pablo Neira Ayuso says:
====================
netfilter updates for net-next
The following patchset contains netfilter updates for net-next, just a
bunch of cleanups and small enhancement to selectively flush conntracks
in ctnetlink, more specifically the patches are:
1) Rise default number of buckets in conntrack from 16384 to 65536 in
systems with >= 4GBytes, patch from Marcelo Leitner.
2) Small refactor to save one level on indentation in xt_osf, from
Joe Perches.
3) Remove unnecessary sizeof(char) in nf_log, from Fabian Frederick.
4) Another small cleanup to remove redundant variable in nfnetlink,
from Duan Jiong.
5) Fix compilation warning in nfnetlink_cthelper on parisc, from
Chen Gang.
6) Fix wrong format in debugging for ctseqadj, from Gao feng.
7) Selective conntrack flushing through the mark for ctnetlink, patch
from Kristian Evensen.
8) Remove nf_ct_conntrack_flush_report() exported symbol now that is
not required anymore after the selective flushing patch, again from
Kristian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces support for the group policy extension to the VXLAN virtual
port. The extension is disabled by default and only enabled if the user
has provided the respective configuration.
ovs-vsctl add-port br0 vxlan0 -- \
set Interface vxlan0 type=vxlan options:exts=gbp
The configuration interface to enable the extension is based on a new
attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
which can carry additional extensions as needed in the future.
The group policy metadata is stored as binary blob (struct ovs_vxlan_opts)
internally just like Geneve options but transported as nested Netlink
attributes to user space.
Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the
binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced.
The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing
OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
nlattr_set() is currently hardcoded to two levels of nesting. This change
introduces struct ovs_len_tbl to define minimal length requirements plus
next level nesting tables to traverse the key attributes to arbitrary depth.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also factors out Geneve validation code into a new separate function
validate_and_copy_geneve_opts().
A subsequent patch will introduce VXLAN options. Rename the existing
GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
tunnel metadata options.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.
The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.
SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels. However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.
Host 1: Host 2:
Group A Group B Group B Group A
+-----+ +-------------+ +-------+ +-----+
| lxc | | SELinux CTX | | httpd | | VM |
+--+--+ +--+----------+ +---+---+ +--+--+
\---+---/ \----+---/
| |
+---+---+ +---+---+
| vxlan | | vxlan |
+---+---+ +---+---+
+------------------------------+
Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP frames. The extension is therefore disabled by default
and needs to be specifically enabled:
ip link add [...] type vxlan [...] gbp
In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.
Examples:
iptables:
host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
host2# iptables -I INPUT -m mark --mark 0x200 -j DROP
OVS:
# ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
# ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'
[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/xen-netfront.c
Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Don't use uninitialized data in IPVS, from Dan Carpenter.
2) conntrack race fixes from Pablo Neira Ayuso.
3) Fix TX hangs with i40e, from Jesse Brandeburg.
4) Fix budget return from poll calls in dnet and alx, from Eric
Dumazet.
5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph
Jaeger.
6) Fix bug introduced by conversion to list_head in TIPC retransmit
code, from Jon Paul Maloy.
7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey
Khoroshilov.
8) Fix bridge build with INET disabled, from Arnd Bergmann.
9) Fix netlink array overrun for PROBE attributes in openvswitch, from
Thomas Graf.
10) Don't hold spinlock across synchronize_irq() in tg3 driver, from
Prashant Sreedharan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
tg3: Release tp->lock before invoking synchronize_irq()
tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
openvswitch: packet messages need their own probe attribtue
i40e: adds FCoE configure option
cxgb4vf: Fix queue allocation for 40G adapter
netdevice: Add missing parentheses in macro
bridge: only provide proxy ARP when CONFIG_INET is enabled
neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
net: fec: fix MDIO bus assignement for dual fec SoC's
xen-netfront: use different locks for Rx and Tx stats
drivers: net: cpsw: fix multicast flush in dual emac mode
cxgb4vf: Initialize mdio_addr before using it
net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
MAINTAINERS: add me as ibmveth maintainer
tipc: fix bug in broadcast retransmit code
update ip-sysctl.txt documentation (v2)
net/at91_ether: prepare and unprepare clock
...
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.
Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.
Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tracked-down-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removing PSM setting debugfs interface as the IPSP has a well
defined PSM value that should be used.
The patch introduces enable flag that can be used to toggle
6lowpan on/off.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
According to the Bluetooth core specification valid identity addresses
are either Public Device Addresses or Static Random Addresses. IRKs
received with any other type of address should be discarded since we
cannot assume to know the permanent identity of the peer device.
This patch fixes a missing check for the Identity Address when receiving
the Identity Address Information SMP PDU.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.17+
There are too many brackets. Maybe only one bracket is enough.
Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce ovs_tunnel_route_lookup to consolidate route lookup
shared by vxlan, gre, and geneve ports.
Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces udp_offload_callbacks which has the same
GRO functions (but not a GSO function) as offload_callbacks,
except there is an argument to a udp_offload struct passed to
gro_receive and gro_complete functions. This additional argument
can be used to retrieve the per port structure of the encapsulation
for use in gro processing (mostly by doing container_of on the
structure).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When IPV4 support is disabled, we cannot call arp_send from
the bridge code, which would result in a kernel link error:
net/built-in.o: In function `br_handle_frame_finish':
:(.text+0x59914): undefined reference to `arp_send'
:(.text+0x59a50): undefined reference to `arp_tbl'
This makes the newly added proxy ARP support in the bridge
code depend on the CONFIG_INET symbol and lets the compiler
optimize the code out to avoid the link error.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 958501163d ("bridge: Add support for IEEE 802.11 Proxy ARP")
Cc: Kyeyoon Park <kyeyoonp@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable 'controller' is assigned a value that is never used.
Identified by cppcheck tool.
Signed-off-by: Gowtham Anandha Babu <gowtham.ab@samsung.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Send the netdetect configuration information in the response to
NL8021_CMD_GET_WOWLAN commands. This includes the scan interval,
SSIDs to match and frequencies to scan.
Additionally, add the NL80211_WOWLAN_TRIG_NET_DETECT with
NL80211_ATTR_WOWLAN_TRIGGERS_SUPPORTED.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a system contains only self-managed regulatory devices all hints
from the regulatory core are ignored. Stop hint processing early in this
case. These systems usually don't have CRDA deployed, which results in
endless (irrelevent) logs of the form:
cfg80211: Calling CRDA to update world regulatory domain
Make sure there's at least one self-managed device before discarding a
hint, in order to prevent initial hints from disappearing on CRDA
managed systems.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A self-managed device will sometimes need to set its regdomain synchronously.
Notably it should be set before usermode has a chance to query it. Expose
a new API to accomplish this which requires the RTNL.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Radar detection can last indefinite time. There is no
point in deferring a scan request in this case - simply
return -EBUSY.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ctx->conf.radar_enabled should reflect whether radar
detection is enabled for the channel context.
When calculating it, make it consider only the vifs
that have this context assigned (instead of all the
vifs).
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
local->radar_detect_enabled should tell whether
radar_detect is enabled on any interface belonging
to local.
However, it's not getting updated correctly
in many cases (actually, when testing with hwsim
it's never been set, even when the dfs master
is beaconing).
Instead of handling all the corner cases
(e.g. channel switch), simply check whether
radar detection is enabled only when needed,
instead of caching the result.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function adding the supported channels IE during a TDLS connection had
several issues:
1. If the entire subband is usable, the function exitted the loop without
adding it
2. The function only checked chandef_usable, ignoring flags like RADAR
which would prevent TDLS off-channel communcation.
3. HT20 was explicitly required in the chandef, while not a requirement
for TDLS off-channel.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When roaming / suspending, it makes no sense to wait until
the transmit queues of the device are empty. In extreme
condition they can be starved (VO saturating the air), but
even in regular cases, it is pointless to delay the roaming
because the low level driver is trying to send packets to
an AP which is far away. We'd rather drop these packets and
let TCP retransmit if needed. This will allow to speed up
the roaming.
For suspend, the explanation is even more trivial.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The duration variable for the selftests is unsigned long long and with
that use %llu instead of %lld when printing the results.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This moves the execution of Delete Stored Link Key command to the
hci_init4_req phase. No actual code has been changed. The command
is just executed at a later stage of the initialization.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When setting base_reachable_time or base_reachable_time_ms on a
specific interface through sysctl or netlink, the reachable_time
value is not updated.
This means that neighbour entries will continue to be updated using the
old value until it is recomputed in neigh_period_work (which
recomputes the value every 300*HZ).
On systems with HZ equal to 1000 for instance, it means 5mins before
the change is effective.
This patch changes this behavior by recomputing reachable_time after
each set on base_reachable_time or base_reachable_time_ms.
The new value will become effective the next time the neighbour's timer
is triggered.
Changes are made in two places: the netlink code for set and the sysctl
handling code. For sysctl, I use a proc_handler. The ipv6 network
code does provide its own handler but it already refreshes
reachable_time correctly so it's not an issue.
Any other user of neighbour which provide its own handlers must
refresh reachable_time.
Signed-off-by: Jean-Francois Remy <jeff@melix.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
tc code implicitly considers skb->protocol even in case of accelerated
vlan paths and expects vlan protocol type here. However, on rx path,
if the vlan header was already stripped, skb->protocol contains value
of next header. Similar situation is on tx path.
So for skbs that use skb->vlan_tci for tagging, use skb->vlan_proto instead.
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f2f9800d49 "tipc: make tipc node table aware of net
namespace" has added a dereference of sock->sk before making sure it's
not NULL, which makes releasing a tipc socket NULL pointer dereference
for sockets that are not fully initialized.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the redundant timer defined in tipc_sock structure, instead we
can directly reuse the sk_timer defined in sock structure.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/bridge/br_netlink.c: In function ‘br_fill_ifinfo’:
net/bridge/br_netlink.c:146:32: warning: ‘vid_range_flags’ may be used uninitialized in this function [-Wmaybe-uninitialized]
err = br_fill_ifvlaninfo_range(skb, vid_range_start,
^
net/bridge/br_netlink.c:108:6: note: ‘vid_range_flags’ was declared here
u16 vid_range_flags;
Reported-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
version.h inclusion is not necessary as detected by versioncheck.
Signed-off-by: Syam Sidhardhan <s.syam@samsung.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With TLP, the peer may reply to a probe with an
ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
such ACK+DSACK will be missed and only at next, higher ack will the TLP
episode be considered done. Since the DSACK is not present anymore,
this will cost a cwnd reduction.
This patch ensures that this scenario does not cause a cwnd reduction, since
receiving an ACK+DSACK indicates that both the initial segment and the probe
have been received by the peer.
The following packetdrill test, from Neal Cardwell, validates this patch:
// Establish a connection.
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.020 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
// Send 1 packet.
+0 write(4, ..., 1000) = 1000
+0 > P. 1:1001(1000) ack 1
// Loss probe retransmission.
// packets_out == 1 => schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
// In this case, this means: 1.5*RTT + 200ms = 230ms
+.230 > P. 1:1001(1000) ack 1
+0 %{ assert tcpi_snd_cwnd == 10 }%
// Receiver ACKs at tlp_high_seq with a DSACK,
// indicating they received the original packet and probe.
+.020 < . 1:1(0) ack 1001 win 257 <sack 1:1001,nop,nop>
+0 %{ assert tcpi_snd_cwnd == 10 }%
// Send another packet.
+0 write(4, ..., 1000) = 1000
+0 > P. 1001:2001(1000) ack 1
// Receiver ACKs above tlp_high_seq, which should end the TLP episode
// if we haven't already. We should not reduce cwnd.
+.020 < . 1:1(0) ack 2001 win 257
+0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
Credits:
-Gregory helped in finding that tcp_process_tlp_ack was where the cwnd
got reduced in our MPTCP tests.
-Neal wrote the packetdrill test above
-Yuchung reworked the patch to make it more readable.
Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Nandita Dukkipati <nanditad@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As rhashtable_lookup_compare_insert() can guarantee the process
of search and insertion is atomic, it's safe to eliminate the
nl_sk_hash_lock. After this, object insertion or removal will
be protected with per bucket lock on write side while object
lookup is guarded with rcu read lock on read side.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
netif_alloc_rx_queues() uses kcalloc() to allocate memory
for "struct netdev_queue *_rx" array.
If we are doing large rx queue allocation kcalloc() might
fail, so this patch does a fallback to vzalloc().
Similar implementation is done for tx queue allocation in
netif_alloc_netdev_queues().
We avoid failure of high order memory allocation
with the help of vzalloc(), this allows us to do large
rx and tx queue allocation which in turn helps us to
increase the number of queues in tun.
As vmalloc() adds overhead on a critical network path,
__GFP_REPEAT flag is used with kzalloc() to do this fallback
only when really needed.
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the function teql_neigh_release() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the function aead_entries() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>