sctp_inq is never kmalloced, since it's integrated into sctp_ep_common
and only initialized from eps and assocs. Therefore, remove the dead
code from there.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_ssnmap_init() can only be called from sctp_ssnmap_new()
where malloced is always set to 1. Thus, when we call
sctp_ssnmap_free() the test for map->malloced evaluates always
to true.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross says:
====================
A number of improvements for net-next/3.10.
Highlights include:
* Properly exposing linux/openvswitch.h to userspace after the uapi
changes.
* Simplification of locking. It immediately makes things simpler to
reason about and avoids holding RTNL mutex for longer than
necessary. In the near future it will also enable tunnel
registration and more fine-grained locking.
* Miscellaneous cleanups and simplifications.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to add a dozen unions each time at the start
of the function. So, do this once and use it instead. Thus, we
can remove some duplicate code and make it more readable.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: Add buffer utilization fields to /proc/net/sctp/assocs
This patch adds the following fields to /proc/net/sctp/assocs output:
- sk->sk_wmem_alloc as "wmema" (transmit queue bytes committed)
- sk->sk_wmem_queued as "wmemq" (persistent queue size)
- sk->sk_sndbuf as "sndbuf" (size of send buffer in bytes)
- sk->sk_rcvbuf as "rcvbuf" (size of receive buffer in bytes)
When small DATA chunks containing 136 bytes data are sent the TX_QUEUE
(assoc->sndbuf_used) reaches a maximum of 40.9% of sk_sndbuf value when
peer.rwnd = 0. This was diagnosed from sk_wmem_alloc value reaching maximum
value of sk_sndbuf.
TX_QUEUE (assoc->sndbuf_used), sk_wmem_alloc and sk_wmem_queued values are
incremented in sctp_set_owner_w() for outgoing data chunks. Having access to
the above values in /proc/net/sctp/assocs will provide a better understanding
of SCTP buffer management.
With patch applied, example output when peer.rwnd = 0
where:
ASSOC ffff880132298000 is sender
ffff880125343000 is receiver
ASSOC SOCK STY SST ST HBKT ASSOC-ID TX_QUEUE RX_QUEUE \
ffff880132298000 ffff880124a0a0c0 2 1 3 29325 1 214656 0 \
ffff880125343000 ffff8801237d7700 2 1 3 36210 2 0 524520 \
UID INODE LPORT RPORT LADDRS <-> RADDRS HBINT INS OUTS \
0 25108 3455 3456 *10.4.8.3 <-> *10.5.8.3 7500 2 2 \
0 27819 3456 3455 *10.5.8.3 <-> *10.4.8.3 7500 2 2 \
MAXRT T1X T2X RTXC wmema wmemq sndbuf rcvbuf
4 0 0 72 525633 440320 524288 524288
4 0 0 0 1 0 524288 524288
Signed-off-by: Dilip Daya <dilip.daya@hp.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update debugging messages to a more current style.
Emit these debugging messages at KERN_DEBUG instead
of KERN_DEFAULT.
Add and use neigh_dbg(level, fmt, ...) macro
Add dynamic_debug capability via pr_debug
Convert embedded function names to "%s: ", __func__
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than defining ovs specific stats struct (vport_percpu_stats),
we can use existing pcpu_tstats to achieve exactly same functionality.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Currently OVS uses combination of genl and rtnl lock to protect
datapath state. This was done due to networking stack locking.
But this has complicated locking and there are few lock ordering
issues with new tunneling protocols.
Following patch simplifies locking by introducing new ovs mutex
and now this lock is used to protect entire ovs state.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
The current implementation of dev_uc_sync/unsync() assumes that there is
a strict 1-to-1 relationship between the source and destination of the sync.
In other words, once an address has been synced to a destination device, it
will not be synced to any other device through the sync API.
However, there are some virtual devices that aggreate a number of lower
devices and need to sync addresses to all of them. The current
API falls short there.
This patch introduces a new dev_uc_sync_multiple() api that can be called
in the above circumstances and allows sync to work for every invocation.
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since dead only holds two states (0,1), make it a bool instead
of a 'char', which is more appropriate for its purpose.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is actually no need to keep this member in the structure, because
after init it's always 1 anyway, thus always kfree called. This seems to
be an ancient leftover from the very initial implementation from 2.5
times. Only in case the initialization of an association fails, we leave
base.malloced as 0, but we nevertheless kfree it in the error path in
sctp_association_new().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 10b96f7306 (``tcp_memcontrol: remove a redundant statement
in tcp_destroy_cgroup()'') says ``We read the value but make no use
of it.'', but forgot to remove the variable declaration as well. This
was a follow-up commit of 3f1346193 (``memcg: decrement static keys
at real destroy time'') that removed the read of variable 'val'.
This fixes therefore:
CC net/ipv4/tcp_memcontrol.o
net/ipv4/tcp_memcontrol.c: In function ‘tcp_destroy_cgroup’:
net/ipv4/tcp_memcontrol.c:67:6: warning: unused variable ‘val’ [-Wunused-variable]
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, sock_tx_timestamp() always returns 0. The comment that
describes the sock_tx_timestamp() function wrongly says that it
returns an error when an invalid argument is passed (from commit
20d4947353, ``net: socket infrastructure for SO_TIMESTAMPING'').
Make the function void, so that we can also remove all the unneeded
if conditions that check for such a _non-existant_ error case in the
output path.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can move th->check computation out of the loop, as compiler
doesn't know each skb initially share same tcp headers after
skb_segment()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed that TSQ (TCP Small queues) was less effective when TSO is
turned off, and GSO is on. If BQL is not enabled, TSQ has then no
effect.
It turns out the GSO engine frees the original gso_skb at the time the
fragments are generated and queued to the NIC.
We should instead call the tcp_wfree() destructor for the last fragment,
to keep the flow control as intended in TSQ. This effectively limits
the number of queued packets on qdisc + NIC layers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_csum_skb_nextlayer() / pskb_may_pull() can change skb->head, so we
must be careful not keeping pointers to previous headers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Grégoire Baron <baronchon@n7mm.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 763eff57de.
It causes build regressions, as per Stephen Rothwell:
====================
After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:
net/core/netprio_cgroup.c:250:29: error: static declaration of 'net_prio_subsys' follows non-static declaration
include/linux/cgroup_subsys.h:71:1: note: previous declaration of 'net_prio_subsys' was here
====================
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Allow to avoid copying DSCP during encapsulation
by setting a SA flag. From Nicolas Dichtel.
2) Constify the netlink dispatch table, no need to modify it
at runtime. From Mathias Krause.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The callers always pass current to sock_update_netprio().
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The callers always pass current to sock_update_classid().
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of invalidating all IPv6 addresses with global scope
when one decides to use IPv6 tokens, we should only invalidate
previous tokens and leave the rest intact until they expire
eventually (or are intact forever). For doing this less greedy
approach, we're adding a bool at the end of inet6_ifaddr structure
instead, for two reasons: i) per-inet6_ifaddr flag space is
already used up, making it wider might not be a good idea,
since ii) also we do not necessarily need to export this
information into user space.
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we set the iftoken in inet6_set_iftoken(), we return -EINVAL
when the device does not have flag IF_READY. This is however not
necessary and rather an artificial usability barrier, since we
simply can set the token despite that, and in case the device is
ready, we just send out our rs, otherwise ifup et al. will do
this for us anyway.
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we check for !ipv6_addr_any(&in6_dev->token) in
addrconf_prefix_rcv(), make the token initialization on
device setup more intuitive by using in6addr_any as an
initializer.
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for IPv6 tokenized IIDs, that allow
for administrators to assign well-known host-part addresses
to nodes whilst still obtaining global network prefix from
Router Advertisements. It is currently in draft status.
The primary target for such support is server platforms
where addresses are usually manually configured, rather
than using DHCPv6 or SLAAC. By using tokenised identifiers,
hosts can still determine their network prefix by use of
SLAAC, but more readily be automatically renumbered should
their network prefix change. [...]
The disadvantage with static addresses is that they are
likely to require manual editing should the network prefix
in use change. If instead there were a method to only
manually configure the static identifier part of the IPv6
address, then the address could be automatically updated
when a new prefix was introduced, as described in [RFC4192]
for example. In such cases a DNS server might be
configured with such a tokenised interface identifier of
::53, and SLAAC would use the token in constructing the
interface address, using the advertised prefix. [...]
http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
The implementation is partially based on top of Mark K.
Thompson's proof of concept. However, it uses the Netlink
interface for configuration resp. data retrival, so that
it can be easily extended in future. Successfully tested
by myself.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two sections checked whether the current channel != the new channel
without ever setting the current channel variables.
1. net/mac802154/tx.c: Prevent set_channel() from getting called every
time a packet is sent.
2. net/mac802154/mib.c: Lock (pib_lock) accesses to current_channel and
current_page and make sure they are updated when the channel has been
changed.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hi Greg,
I'm unsure if you or Dave should take that one as it's for one a TTY
patch but also living under net/. So I'm uncertain and let you decide!
Thanks,
Mathias
-- >8 --
Subject: [PATCH] TTY: ircomm, use GFP_KERNEL in ircomm_open()
We're clearly running in non-atomic context as our only call site is
able to call wait_event_interruptible(). So we're safe to use GFP_KERNEL
here instead of GFP_ATOMIC.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only call site of irda_connect_response() is irda_accept() -- a
function called from user context only. Therefore it has no need for
GFP_ATOMIC.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
irda_create() is called from user context only, therefore has no need
for GFP_ATOMIC.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull() can change skb->head, so we must init iph/greh after
calling it.
Bug added in commit c544193214 (GRE: Refactor GRE tunneling code.)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check for NULL before calling the following operations from "struct
ieee802154_mlme_ops": assoc_req, assoc_resp, disassoc_req, start_req,
and scan_req.
This fixes a current oops where those functions are called but not
implemented. It also updates the documentation to clarify that they
are now optional by design. If a call to an unimplemented function
is attempted, the kernel returns EOPNOTSUPP via netlink.
The following operations are still required: get_phy, get_pan_id,
get_short_addr, and get_dsn.
Note that the places where this patch changes the initialization
of "ret" should not affect the rest of the code since "ret" was
always set (again) before returning its value.
Signed-off-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that uids and gids are completely encapsulated in kuid_t
and kgid_t we no longer need to pass struct cred which allowed
us to test both the uid and the user namespace for equality.
Passing struct cred potentially allows us to pass the entire group
list as BSD does but I don't believe the cost of cache line misses
justifies retaining code for a future potential application.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/nfc/microread/mei.c
net/netfilter/nfnetlink_queue_core.c
Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
some cleanups are going to go on-top.
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_queue_xmit() will return a positive value if the packet could not be
queued, often because the real network device (in our case the mac802154
wpan device) has its queue stopped. lowpan_xmit() should handle the
positive return code (for the debug statement) and return that value to
the higher layer so the higher layer will retry sending the packet.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase the buffer length from 10 to 300 packets. Consider that traffic on
mac802154 devices will often be 6LoWPAN, and a full-length (1280 octet)
IPv6 packet will fragment into 15 6LoWPAN fragments (because the MTU of
IEEE 802.15.4 is 127). A 300-packet queue is really 20 full-length IPv6
packets.
With a queue length of 10, an entire IPv6 packet was unable to get queued
at one time, causing fragments to be dropped, and making reassembly
impossible.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use netif_stop_queue() and netif_wake_queue() to control the flow of
packets to mac802154 devices. Since many IEEE 802.15.4 devices have no
output buffer, and since the mac802154 xmit() function is designed to
block, netif_stop_queue() is called after each packet.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ops->xmit() fails, drop the packet. Devices which support hardware
ack and retry (which include all devices currently supported by mainline),
will automatically retry sending the packet (in the hardware) up to 3
times, per the 802.15.4 spec. There is no need, and it is incorrect to
try to do it in mac802154.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(Resend with a better changelog)
garp_pdu_queue() should ways be called with this spin lock.
garp_uninit_applicant() only holds rtnl lock which is not
enough here. A possible race can happen as garp_pdu_rcv()
is called in BH context:
garp_pdu_rcv()
|->garp_pdu_parse_msg()
|->garp_pdu_parse_attr()
|-> garp_gid_event()
Found by code inspection.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ward <david.ward@ll.mit.edu>
Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code misses to update the msg_namelen member to 0 and therefore
makes net/socket.c leak the local, uninitialized sockaddr_storage
variable to userland -- 128 bytes of kernel stack memory.
Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case we received no data on the call to skb_recv_datagram(), i.e.
skb->data is NULL, vmci_transport_dgram_dequeue() will return with 0
without updating msg_namelen leading to net/socket.c leaking the local,
uninitialized sockaddr_storage variable to userland -- 128 bytes of
kernel stack memory.
Fix this by moving the already existing msg_namelen assignment a few
lines above.
Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.
Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.
Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.
Fix the issue by initializing the memory used for sockaddr info with
memset(0).
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code in llcp_sock_recvmsg() does not initialize all the members of
struct sockaddr_nfc_llcp when filling the sockaddr info. Nor does it
initialize the padding bytes of the structure inserted by the compiler
for alignment.
Also, if the socket is in state LLCP_CLOSED or is shutting down during
receive the msg_namelen member is not updated to 0 while otherwise
returning with 0, i.e. "success". The msg_namelen update is also
missing for stream and seqpacket sockets which don't fill the sockaddr
info.
Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.
Fix the first issue by initializing the memory used for sockaddr info
with memset(0). Fix the second one by setting msg_namelen to 0 early.
It will be updated later if we're going to fill the msg_name member.
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>