Commit Graph

30598 Commits

Author SHA1 Message Date
Daniel Borkmann
82a37132f3 netfilter: x_tables: lightweight process control group matching
It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.

Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.

As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.

In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.

One possible, minimal usage example (many other iptables options
can be applied obviously):

 1) Configuring cgroups if not already done, e.g.:

  mkdir /sys/fs/cgroup/net_cls
  mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
  mkdir /sys/fs/cgroup/net_cls/0
  echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
  (resp. a real flow handle id for tc)

 2) Configuring netfilter (iptables-nftables), e.g.:

  iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP

 3) Running applications, e.g.:

  ping 208.67.222.222  <pid:1799>
  echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
  [...]
  ping 208.67.220.220  <pid:1804>
  ping: sendmsg: Operation not permitted
  [...]
  echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
  [...]

Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.

  [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:44 +01:00
Daniel Borkmann
86f8515f97 net: netprio: rename config to be more consistent with cgroup configs
While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_<SUBSYS>.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:42 +01:00
Daniel Borkmann
fe1217c4f3 net: net_cls: move cgroupfs classid handling into core
Zefan Li requested [1] to perform the following cleanup/refactoring:

- Split cgroupfs classid handling into net core to better express a
  possible more generic use.

- Disable module support for cgroupfs bits as the majority of other
  cgroupfs subsystems do not have that, and seems to be not wished
  from cgroup side. Zefan probably might want to follow-up for netprio
  later on.

- By this, code can be further reduced which previously took care of
  functionality built when compiled as module.

cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
that we are consistent with {netclassid,netprio}_cgroup naming that is
under net/core/ as suggested by Zefan.

No change in functionality, but only code refactoring that is being
done here.

 [1] http://patchwork.ozlabs.org/patch/304825/

Suggested-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:41 +01:00
Eric Leblond
14abfa161d netfilter: xt_CT: fix error value in xt_ct_tg_check()
If setting event mask fails then we were returning 0 for success.
This patch updates return code to -EINVAL in case of problem.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:39 +01:00
stephen hemminger
dcd93ed4cd netfilter: nf_conntrack: remove dead code
The following code is not used in current upstream code.
Some of this seems to be old hooks, other might be used by some
out of tree module (which I don't care about breaking), and
the need_ipv4_conntrack was used by old NAT code but no longer
called.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:37 +01:00
stephen hemminger
02eca9d2cc netfilter: ipset: remove unused code
Function never used in current upstream code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:35 +01:00
Daniel Borkmann
34ce324019 netfilter: nf_nat: add full port randomization support
We currently use prandom_u32() for allocation of ports in tcp bind(0)
and udp code. In case of plain SNAT we try to keep the ports as is
or increment on collision.

SNAT --random mode does use per-destination incrementing port
allocation. As a recent paper pointed out in [1] that this mode of
port allocation makes it possible to an attacker to find the randomly
allocated ports through a timing side-channel in a socket overloading
attack conducted through an off-path attacker.

So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
in regard to the attack described in this paper. As we need to keep
compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
selection algorithm with a simple prandom_u32() in order to mitigate
this attack vector. Note that the lfsr113's internal state is
periodically reseeded by the kernel through a local secure entropy
source.

More details can be found in [1], the basic idea is to send bursts
of packets to a socket to overflow its receive queue and measure
the latency to detect a possible retransmit when the port is found.
Because of increasing ports to given destination and port, further
allocations can be predicted. This information could then be used by
an attacker for e.g. for cache-poisoning, NS pinning, and degradation
of service attacks against DNS servers [1]:

  The best defense against the poisoning attacks is to properly
  deploy and validate DNSSEC; DNSSEC provides security not only
  against off-path attacker but even against MitM attacker. We hope
  that our results will help motivate administrators to adopt DNSSEC.
  However, full DNSSEC deployment make take significant time, and
  until that happens, we recommend short-term, non-cryptographic
  defenses. We recommend to support full port randomisation,
  according to practices recommended in [2], and to avoid
  per-destination sequential port allocation, which we show may be
  vulnerable to derandomisation attacks.

Joint work between Hannes Frederic Sowa and Daniel Borkmann.

 [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
 [2] http://arxiv.org/pdf/1205.5190v1.pdf

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:26 +01:00
Geert Uytterhoeven
9dcbe1b87c ipvs: Remove unused variable ret from sync_thread_master()
net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master':
net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable]

Commit 35a2af94c7 ("sched/wait: Make the
__wait_event*() interface more friendly") changed how the interruption
state is returned. However, sync_thread_master() ignores this state,
now causing a compile warning.

According to Julian Anastasov <ja@ssi.bg>, this behavior is OK:

    "Yes, your patch looks ok to me. In the past we used ssleep() but IPVS
     users were confused why IPVS threads increase the load average. So, we
     switched to _interruptible calls and later the socket polling was
     added."

Document this, as requested by Peter Zijlstra, to avoid precious developers
disappearing in this pitfall in the future.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-12-27 12:19:32 +09:00
fan.du
6a649f3398 netfilter: add IPv4/6 IPComp extension match support
With this plugin, user could specify IPComp tagged with certain
CPI that host not interested will be DROPped or any other action.

For example:
iptables  -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP
ip6tables -A INPUT -p 108 -m ipcomp --ipcompspi 0x87 -j DROP

Then input IPComp packet with CPI equates 0x87 will not reach
upper layer anymore.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-24 12:37:58 +01:00
Valentina Giusti
08c0cad69f netfilter: nfnetlink_queue: enable UID/GID socket info retrieval
Thanks to commits 41063e9 (ipv4: Early TCP socket demux) and 421b388
(udp: ipv4: Add udp early demux) it is now possible to parse UID and
GID socket info also for incoming TCP and UDP connections. Having
this info available, it is convenient to let NFQUEUE parse it in
order to improve and refine the traffic analysis in userspace.

Signed-off-by: Valentina Giusti <valentina.giusti@bmw-carit.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-21 11:57:54 +01:00
Florian Westphal
534473c608 netfilter: ctnetlink: honor CTA_MARK_MASK when setting ctmark
Useful to only set a particular range of the conntrack mark while
leaving exisiting parts of the value alone, e.g. when setting
conntrack marks via NFQUEUE.

Follows same scheme as MARK/CONNMARK targets, i.e. the mask defines
those bits that should be altered.  No mask is equal to '~0', ie.
the old value is replaced by new one.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-20 10:21:42 +01:00
Florian Westphal
a42b99a6e3 netfilter: avoid get_random_bytes calls
All these users need an initial seed value for jhash, prandom is
perfectly fine.  This avoids draining the entropy pool where
its not strictly required.

nfnetlink_log did not use the random value at all.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-20 10:21:40 +01:00
Florent Fourcot
6853605360 ipv6: fix incorrect type in declaration
Introduced by 1397ed35f2
  "ipv6: add flowinfo for tcp6 pkt_options for all cases"

Reported-by: kbuild test robot <fengguang.wu@intel.com>

V2: fix the title, add empty line after the declaration (Sergei Shtylyov
feedbacks)

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 16:14:09 -05:00
Jerry Chu
299603e837 net-gro: Prepare GRO stack for the upcoming tunneling support
This patch modifies the GRO stack to avoid the use of "network_header"
and associated macros like ip_hdr() and ipv6_hdr() in order to allow
an arbitary number of IP hdrs (v4 or v6) to be used in the
encapsulation chain. This lays the foundation for various IP
tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later.

With this patch, the GRO stack traversing now is mostly based on
skb_gro_offset rather than special hdr offsets saved in skb (e.g.,
skb->network_header). As a result all but the top layer (i.e., the
the transport layer) must have hdrs of the same length in order for
a pkt to be considered for aggregation. Therefore when adding a new
encap layer (e.g., for tunneling), one must check and skip flows
(e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a
different hdr length.

Note that unlike the network header, the transport header can and
will continue to be set by the GRO code since there will be at
most one "transport layer" in the encap chain.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 13:47:53 -05:00
Jiri Benc
7e98056964 ipv6: router reachability probing
RFC 4191 states in 3.5:

   When a host avoids using any non-reachable router X and instead sends
   a data packet to another router Y, and the host would have used
   router X if router X were reachable, then the host SHOULD probe each
   such router X's reachability by sending a single Neighbor
   Solicitation to that router's address.  A host MUST NOT probe a
   router's reachability in the absence of useful traffic that the host
   would have sent to the router if it were reachable.  In any case,
   these probes MUST be rate-limited to no more than one per minute per
   router.

Currently, when the neighbour corresponding to a router falls into
NUD_FAILED, it's never considered again. Introduce a new rt6_nud_state
value, RT6_NUD_FAIL_PROBE, which suggests the route should not be used but
should be probed with a single NS. The probe is ratelimited by the existing
code. To better distinguish meanings of the failure values, rename
RT6_NUD_FAIL_SOFT to RT6_NUD_FAIL_DO_RR.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 16:02:58 -05:00
wangweidong
e477266838 sctp: remove redundant null check on asoc
In sctp_err_lookup, goto out while the asoc is not NULL, so remove the
check NULL. Also, in sctp_err_finish which called by sctp_v4_err and
sctp_v6_err, they pass asoc to sctp_err_finish while the asoc is not
NULL, so remove the check.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 15:39:34 -05:00
Yang Yingliang
6b1dd85601 sch_htb: remove unnecessary NULL pointer judgment
It already has a NULL pointer judgment of rtab in qdisc_put_rtab().
Remove the judgment outside of qdisc_put_rtab().

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 15:30:17 -05:00
Nicolas Dichtel
b601fa197f ipv4: fix wildcard search with inet_confirm_addr()
Help of this function says: "in_dev: only on this interface, 0=any interface",
but since commit 39a6d06300 ("[NETNS]: Process inet_confirm_addr in the
correct namespace."), the code supposes that it will never be NULL. This
function is never called with in_dev == NULL, but it's exported and may be used
by an external module.

Because this patch restore the ability to call inet_confirm_addr() with in_dev
== NULL, I partially revert the above commit, as suggested by Julian.

CC: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 14:47:40 -05:00
Yang Yingliang
4f8f61eb43 net_sched: expand control flow of macro SKIP_NONLOCAL
SKIP_NONLOCAL hides the control flow. The control flow should be
inlined and expanded explicitly in code so that someone who reads
it can tell the control flow can be changed by the statement.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 12:29:26 -05:00
Ying Xue
77a7e07a78 tipc: remove unused 'blocked' flag from tipc_link struct
In early versions of TIPC it was possible to administratively block
individual links through the use of the member flag 'blocked'. This
functionality was deemed redundant, and since commit 7368dd ("tipc:
clean out all instances of #if 0'd unused code"), this flag has been
unused.

In the current code, a link only needs to be blocked for sending and
reception if it is subject to an ongoing link failover. In that case,
it is sufficient to check if the number of expected failover packets
is non-zero, something which is done via the funtion 'link_blocked()'.

This commit finally removes the redundant 'blocked' flag completely.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 00:17:43 -05:00
Ying Xue
e4d050cbf7 tipc: eliminate code duplication in media layer
Currently TIPC supports two L2 media types, Ethernet and Infiniband.
Because both these media are accessed through the common net_device API,
several functions in the two media adaptation files turn out to be
fully or almost identical, leading to unnecessary code duplication.

In this commit we extract this common code from the two media files
and move them to the generic bearer.c. Additionally, we change
the function names to reflect their real role: to access L2 media,
irrespective of type.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 00:17:43 -05:00
Ying Xue
6e967adf79 tipc: relocate common functions from media to bearer
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.

As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.

Note that the eth_started and ib_started flags were removed during the
code relocation.  There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.

Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 00:17:43 -05:00
Ying Xue
37cb062007 tipc: remove TIPC usage of field af_packet_priv in struct net_device
TIPC is currently using the field 'af_packet_priv' in struct net_device
as a handle to find the bearer instance associated to the given network
device. But, by doing so it is blocking other networking cleanups, such
as the one discussed here:

http://patchwork.ozlabs.org/patch/178044/

This commit removes this usage from TIPC. Instead, we introduce a new
field, 'tipc_ptr', to the net_device structure, to serve this purpose.
When TIPC bearer is enabled, the bearer object is associated to
'tipc_ptr'. When a TIPC packet arrives in the recv_msg() upcall
from a networking device, the bearer object can now be obtained from
'tipc_ptr'. When a bearer is disabled, the bearer object is detached
from its underlying network device by setting 'tipc_ptr' to NULL.

Additionally, an RCU lock is used to protect the new pointer.
Henceforth, the existing tipc_net_lock is used in write mode to
serialize write accesses to this pointer, while the new RCU lock is
applied on the read side to ensure that the pointer is 100% valid
within its wrapped area for all readers.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 00:17:42 -05:00
Jon Paul Maloy
ef72a7e02a tipc: improve naming and comment consistency in media layer
struct 'tipc_media' represents the specific info that the media
layer adaptors (eth_media and ib_media) expose to the generic
bearer layer. We clarify this by improved commenting, and by giving
the 'media_list' array the more appropriate name 'media_info_array'.

There are no functional changes in this commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 00:17:42 -05:00
Jon Paul Maloy
5702dbab68 tipc: initiate media type array at compile time
Communication media types are abstracted through the struct 'tipc_media',
one per media type. These structs are allocated statically inside their
respective media file.

Furthermore, in order to be able to reach all instances from a central
location, we keep a static array with pointers to these structs. This
array is currently initialized at runtime, under protection of
tipc_net_lock. However, since the contents of the array itself never
changes after initialization, we can just as well initialize it at
compile time and make it 'const', at the same time making it obvious
that no lock protection is needed here.

This commit makes the array constant and removes the redundant lock
protection.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 00:17:42 -05:00
Ying Xue
d77b3831f7 tipc: eliminate redundant code with kfree_skb_list routine
sk_buff lists are currently relased by looping over the list and
explicitly releasing each buffer.

We replace all occurrences of this loop with a call to kfree_skb_list().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 00:17:42 -05:00
Yang Yingliang
fa08943b97 net_sched: sfq: put sfq_unlink in a do - while loop
Macros with multiple statements should be enclosed in a do - while loop

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:44:52 -05:00
Yang Yingliang
833fa74386 net_sched: add space around '>' and before '('
Spaces required around that '>' (ctx:VxV) and
before the open parenthesis '('.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:44:51 -05:00
Yang Yingliang
82d567c266 net_sched: change "foo* bar" to "foo *bar"
"foo* bar" or "foo * bar" should be "foo *bar".

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:44:51 -05:00
Yang Yingliang
1fab9abc56 net_sched: cls_bpf: use tabs to do indent
Code indent should use tabs where possible

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:44:51 -05:00
Yang Yingliang
17569faedf net_sched: remove unnecessary parentheses while return
return is not a function, parentheses are not required.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:44:51 -05:00
Yann Droneaud
d73aa2867f net: handle error more gracefully in socketpair()
This patch makes socketpair() use error paths which do not
rely on heavy-weight call to sys_close(): it's better to try
to push the file descriptor to userspace before installing
the socket file to the file descriptor, so that errors are
catched earlier and being easier to handle.

Using sys_close() seems to be the exception, while writing the
file descriptor before installing it look like it's more or less
the norm: eg. except for code used in init/, error handling
involve fput() and put_unused_fd(), but not sys_close().

This make socketpair() usage of sys_close() quite unusual.
So it deserves to be replaced by the common pattern relying on
fput() and put_unused_fd() just like, for example, the one used
in pipe(2) or recvmsg(2).

Three distinct error paths are still needed since calling
fput() on file structure returned by sock_alloc_file() will
implicitly call sock_release() on the associated socket
structure.

Cc: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=1385979146-13825-1-git-send-email-ydroneaud@opteya.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:24:13 -05:00
stephen hemminger
8e3bff96af net: more spelling fixes
Various spelling fixes in networking stack

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 21:57:11 -05:00
Jiri Pirko
ad6c81359f ipv4: add support for IFA_FLAGS nl attribute
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 21:50:00 -05:00
Jiri Pirko
9a32b86043 dn_dev: add support for IFA_FLAGS nl attribute
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 21:50:00 -05:00
Jiri Pirko
77d47afbf3 neigh: use neigh_parms_net() to get struct neigh_parms->net pointer
This fixes compile error when CONFIG_NET_NS is not set.

Introduced by:
commit 1d4c8c2984
    "neigh: restore old behaviour of default parms values"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 17:58:44 -05:00
Jiri Pirko
971a351ccb ipv6 addrconf: revert /proc/net/if_inet6 ifa_flag format
Turned out that applications like ifconfig do not handle the change.
So revert ifa_flag format back to 2-letter hex value.

Introduced by:
commit 479840ffdb
    "ipv6 addrconf: extend ifa_flags to u32"

Reported-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Tested-by: FLorent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 17:47:18 -05:00
Florent Fourcot
ac1eabcaec ipv6: use ip6_flowinfo helper
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:50 -05:00
Florent Fourcot
3308de2b84 ipv6: add ip6_flowlabel helper
And use it if possible.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Florent Fourcot
82e9f105a2 ipv6: remove rcv_tclass of ipv6_pinfo
tclass information in now already stored in rcv_flowinfo
We do not need to store the same information twice.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Florent Fourcot
37cfee909c ipv6: move IPV6_TCLASS_MASK definition in ipv6.h
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Florent Fourcot
1397ed35f2 ipv6: add flowinfo for tcp6 pkt_options for all cases
The current implementation of IPV6_FLOWINFO only gives a
result if pktoptions is available (thanks to the
ip6_datagram_recv_ctl function).
It gives inconsistent results to user space, sometimes
there is a result for getsockopt(IPV6_FLOWINFO), sometimes
not.

This patch add rcv_flowinfo to store it, and return it to
the userspace in the same way than other pkt_options.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Joe Perches
d9cd4fe5ef batadv: Slight optimization of batadv_compare_eth
Use the newly added generic routine ether_addr_equal_unaligned
to test if possibly unaligned to u16 Ethernet addresses are equal.

This slightly improves comparison time for systems with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:58:11 -05:00
Jiri Pirko
bba24896f0 neigh: ipv6: respect default values set before an address is assigned to device
Make the behaviour similar to ipv4. This will allow user to set sysctl
default neigh param values and these values will be respected even by
devices registered before (that ones what do not have address set yet).

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko
1d4c8c2984 neigh: restore old behaviour of default parms values
Previously inet devices were only constructed when addresses are added.
Therefore the default neigh parms values they get are the ones at the
time of these operations.

Now that we're creating inet devices earlier, this changes the behaviour
of default neigh parms values in an incompatible way (see bug #8519).

This patch creates a compromise by setting the default values at the
same point as before but only for those that have not been explicitly
set by the user since the inet device's creation.

Introduced by:
commit 8030f54499
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Feb 22 01:53:47 2007 +0900

    [IPV4] devinet: Register inetdev earlier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko
73af614aed neigh: use tbl->family to distinguish ipv4 from ipv6
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko
cb5b09c17f neigh: wrap proc dointvec functions
This will be needed later on to provide better management of default values.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Jiri Pirko
1f9248e560 neigh: convert parms to an array
This patch converts the neigh param members to an array. This allows easier
manipulation which will be needed later on to provide better management of
default values.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:56:12 -05:00
Erik Hugne
512137eeff tipc: remove interface state mirroring in bearer
struct 'tipc_bearer' is a generic representation of the underlying
media type, and exists in a one-to-one relationship to each interface
TIPC is using. The struct contains a 'blocked' flag that mirrors the
operational and execution state of the represented interface, and is
updated through notification calls from the latter. The users of
tipc_bearer are checking this flag before each attempt to send a
packet via the interface.

This state mirroring serves no purpose in the current code base. TIPC
links will not discover a media failure any faster through this
mechanism, and in reality the flag only adds overhead at packet
sending and reception.

Furthermore, the fact that the flag needs to be protected by a spinlock
aggregated into tipc_bearer has turned out to cause a serious and
completely unnecessary deadlock problem.

CPU0                                    CPU1
----                                    ----
Time 0: bearer_disable()                link_timeout()
Time 1:   spin_lock_bh(&b_ptr->lock)      tipc_link_push_queue()
Time 2:   tipc_link_delete()                tipc_bearer_blocked(b_ptr)
Time 3:     k_cancel_timer(&req->timer)       spin_lock_bh(&b_ptr->lock)
Time 4:       del_timer_sync(&req->timer)

I.e., del_timer_sync() on CPU0 never returns, because the timer handler
on CPU1 is waiting for the bearer lock.

We eliminate the 'blocked' flag from struct tipc_bearer, along with all
tests on this flag. This not only resolves the deadlock, but also
simplifies and speeds up the data path execution of TIPC. It also fits
well into our ongoing effort to make the locking policy simpler and
more manageable.

An effect of this change is that we can get rid of functions such as
tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
We replace the latter with a new function, tipc_reset_bearer(), which
resets all links associated to the bearer immediately after an
interface goes down.

A user might notice one slight change in link behaviour after this
change. When an interface goes down, (e.g. through a NETDEV_DOWN
event) all attached links will be reset immediately, instead of
leaving it to each link to detect the failure through a timer-driven
mechanism. We consider this an improvement, and see no obvious risks
with the new behavior.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <Paul.Gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:30:29 -05:00
wangweidong
b73e9e3cf0 x25: convert printks to pr_<level>
use pr_<level> instead of printk(LEVEL)

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:24:18 -05:00