linux

Author	SHA1	Message	Date
Thomas Graf	6f705d8cfc	bridge: Add missing policy entry for IFLA_BRPORT_FAST_LEAVE Fixes: `c2d3babf` ("bridge: implement multicast fast leave") Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-26 15:29:01 -05:00
David S. Miller	958d03b016	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== netfilter/ipvs updates for net-next The following patchset contains Netfilter updates for your net-next tree, this includes the NAT redirection support for nf_tables, the cgroup support for nft meta and conntrack zone support for the connlimit match. Coming after those, a bunch of sparse warning fixes, missing netns bits and cleanups. More specifically, they are: 1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables, patches from Arturo Borrero. 2) Introduce the nf_tables redir expression, from Arturo Borrero. 3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt(). Patch from Alex Gartrell. 4) Add nft_log_dereference() macro to the nf_log infrastructure, patch from Marcelo Leitner. 5) Add some extra validation when registering logger families, also from Marcelo. 6) Some spelling cleanups from stephen hemminger. 7) Fix sparse warning in nf_logger_find_get(). 8) Add cgroup support to nf_tables meta, patch from Ana Rey. 9) A Kconfig fix for the new redir expression and fix sparse warnings in the new redir expression. 10) Fix several sparse warnings in the netfilter tree, from Florian Westphal. 11) Reduce verbosity when OOM in nfnetlink_log. User can basically do nothing when this situation occurs. 12) Add conntrack zone support to xt_connlimit, again from Florian. 13) Add netnamespace support to the h323 conntrack helper, contributed by Vasily Averin. 14) Remove unnecessary nul-pointer checks before free_percpu() and module_put(), from Markus Elfring. 15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-24 16:00:58 -05:00
David S. Miller	1459143386	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ieee802154/fakehard.c A bug fix went into 'net' for ieee802154/fakehard.c, which is removed in 'net-next'. Add build fix into the merge from Stephen Rothwell in openvswitch, the logging macros take a new initial 'log' argument, a new call was added in 'net' so when we merge that in here we have to explicitly add the new 'log' arg to it else the build fails. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 22:28:24 -05:00
Jiri Pirko	62749e2cb3	vlan: rename __vlan_put_tag to vlan_insert_tag_set_proto Name fits better. Plus there's going to be introduced __vlan_insert_tag later on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 14:20:17 -05:00
Linus Lüssing	f0b4eeced5	bridge: fix netfilter/NF_BR_LOCAL_OUT for own, locally generated queries Ebtables on the OUTPUT chain (NF_BR_LOCAL_OUT) would not work as expected for both locally generated IGMP and MLD queries. The IP header specific filter options are off by 14 Bytes for netfilter (actual output on interfaces is fine). NF_HOOK() expects the skb->data to point to the IP header, not the ethernet one (while dev_queue_xmit() does not). Luckily there is an br_dev_queue_push_xmit() helper function already - let's just use that. Introduced by `eb1d164143` ("bridge: Add core IGMP snooping support") Ebtables example: $ ebtables -I OUTPUT -p IPv6 -o eth1 --logical-out br0 \ --log --log-level 6 --log-ip6 --log-prefix="~EBT: " -j DROP before (broken): ~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \ MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \ SRC=64a4:39c2:86dd:6000:0000:0020:0001:fe80 IPv6 \ DST=0000:0000:0000:0004:64ff:fea4:39c2:ff02, \ IPv6 priority=0x3, Next Header=2 after (working): ~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \ MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \ SRC=fe80:0000:0000:0000:0004:64ff:fea4:39c2 IPv6 \ DST=ff02:0000:0000:0000:0000:0000:0000:0001, \ IPv6 priority=0x0, Next Header=0 Signed-off-by: Linus Lüssing <linus.luessing@web.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-11-17 12:38:02 +01:00
Florian Westphal	5676864431	netfilter: fix various sparse warnings net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static? no; add include net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static? yes net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static? no; add include net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static? yes net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static? no; add include net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3) net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3) add __force, 3 is what we want. net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static? yes net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static? no; add include Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-11-13 12:14:42 +01:00
David S. Miller	4e84b496fd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-11-06 22:01:18 -05:00
Guenter Roeck	c1207c049b	netfilter: nft_reject_bridge: Fix powerpc build error Fix: net/bridge/netfilter/nft_reject_bridge.c: In function 'nft_reject_br_send_v6_unreach': net/bridge/netfilter/nft_reject_bridge.c:240:3: error: implicit declaration of function 'csum_ipv6_magic' csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr, ^ make[3]: *** [net/bridge/netfilter/nft_reject_bridge.o] Error 1 Seen with powerpc:allmodconfig. Fixes: `523b929d54` ("netfilter: nft_reject_bridge: don't use IP stack to reject traffic") Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-03 12:12:34 -05:00
David S. Miller	55b42b5ca2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/phy/marvell.c Simple overlapping changes in drivers/net/phy/marvell.c Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-01 14:53:27 -04:00
Pablo Neira Ayuso	127917c29a	netfilter: nft_reject_bridge: restrict reject to prerouting and input Restrict the reject expression to the prerouting and input bridge hooks. If we allow this to be used from forward or any other later bridge hook, if the frame is flooded to several ports, we'll end up sending several reject packets, one per cloned packet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-31 12:50:09 +01:00
Pablo Neira Ayuso	523b929d54	netfilter: nft_reject_bridge: don't use IP stack to reject traffic If the packet is received via the bridge stack, this cannot reject packets from the IP stack. This adds functions to build the reject packet and send it from the bridge stack. Comments and assumptions on this patch: 1) Validate the IPv4 and IPv6 headers before further processing, given that the packet comes from the bridge stack, we cannot assume they are clean. Truncated packets are dropped, we follow similar approach in the existing iptables match/target extensions that need to inspect layer 4 headers that is not available. This also includes packets that are directed to multicast and broadcast ethernet addresses. 2) br_deliver() is exported to inject the reject packet via bridge localout -> postrouting. So the approach is similar to what we already do in the iptables reject target. The reject packet is sent to the bridge port from which we have received the original packet. 3) The reject packet is forged based on the original packet. The TTL is set based on sysctl_ip_default_ttl for IPv4 and per-net ipv6.devconf_all hoplimit for IPv6. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-31 12:50:08 +01:00
Pablo Neira Ayuso	4d87716cd0	netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing Fixes: `36d2af5` ("netfilter: nf_tables: allow to filter from prerouting and postrouting") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-31 12:44:56 +01:00
Kyeyoon Park	958501163d	bridge: Add support for IEEE 802.11 Proxy ARP This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows the AP devices to keep track of the hardware-address-to-IP-address mapping of the mobile devices within the WLAN network. The AP will learn this mapping via observing DHCP, ARP, and NS/NA frames. When a request for such information is made (i.e. ARP request, Neighbor Solicitation), the AP will respond on behalf of the associated mobile device. In the process of doing so, the AP will drop the multicast request frame that was intended to go out to the wireless medium. It was recommended at the LKS workshop to do this implementation in the bridge layer. vxlan.c is already doing something very similar. The DHCP snooping code will be added to the userspace application (hostapd) per the recommendation. This RFC commit is only for IPv4. A similar approach in the bridge layer will be taken for IPv6 as well. Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-27 19:02:04 -04:00
Herbert Xu	7677e86843	bridge: Do not compile options in br_parse_ip_options Commit `462fb2af97` bridge : Sanitize skb before it enters the IP stack broke when IP options are actually used because it mangles the skb as if it entered the IP stack which is wrong because the bridge is supposed to operate below the IP stack. Since nobody has actually requested for parsing of IP options this patch fixes it by simply reverting to the previous approach of ignoring all IP options, i.e., zeroing the IPCB. If and when somebody who uses IP options and actually needs them to be parsed by the bridge complains then we can revisit this. Reported-by: David Newall <davidn@davidnewall.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-24 14:24:03 +02:00
David S. Miller	64b1f00a08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2014-10-08 16:22:22 -04:00
Herbert Xu	93fdd47e52	bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING As we may defragment the packet in IPv4 PRE_ROUTING and refragment it after POST_ROUTING we should save the value of frag_max_size. This is still very wrong as the bridge is supposed to leave the packets intact, meaning that the right thing to do is to use the original frag_list for fragmentation. Unfortunately we don't currently guarantee that the frag_list is left untouched throughout netfilter so until this changes this is the best we can do. There is also a spot in FORWARD where it appears that we can forward a packet without going through fragmentation, mark it so that we can fix it later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-07 15:12:44 -04:00
David S. Miller	61b37d2f54	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains another batch with Netfilter/IPVS updates for net-next, they are: 1) Add abstracted ICMP codes to the nf_tables reject expression. We introduce four reasons to reject using ICMP that overlap in IPv4 and IPv6 from the semantic point of view. This should simplify the maintainance of dual stack rule-sets through the inet table. 2) Move nf_send_reset() functions from header files to per-family nf_reject modules, suggested by Patrick McHardy. 3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the code now that br_netfilter can be modularized. Convert remaining spots in the network stack code. 4) Use rcu_barrier() in the nf_tables module removal path to ensure that we don't leave object that are still pending to be released via call_rcu (that may likely result in a crash). 5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad) idea was to probe the word size based on the xtables match/target info size, but this assumption is wrong when you have to dump the information back to userspace. 6) Allow to filter from prerouting and postrouting in the nf_tables bridge. In order to emulate the ebtables NAT chains (which are actually simple filter chains with no special semantics), we have support filtering from this hooks too. 7) Add explicit module dependency between xt_physdev and br_netfilter. This provides a way to detect if the user needs br_netfilter from the configuration path. This should reduce the breakage of the br_netfilter modularization. 8) Cleanup coding style in ip_vs.h, from Simon Horman. 9) Fix crash in the recently added nf_tables masq expression. We have to register/unregister the notifiers to clean up the conntrack table entries from the module init/exit path, not from the rule addition / deletion path. From Arturo Borrero. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-05 21:32:37 -04:00
Vlad Yasevich	5be5a2df40	bridge: Add filtering support for default_pvid Currently when vlan filtering is turned on on the bridge, the bridge will drop all traffic untill the user configures the filter. This isn't very nice for ports that don't care about vlans and just want untagged traffic. A concept of a default_pvid was recently introduced. This patch adds filtering support for default_pvid. Now, ports that don't care about vlans and don't define there own filter will belong to the VLAN of the default_pvid and continue to receive untagged traffic. This filtering can be disabled by setting default_pvid to 0. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-05 21:21:37 -04:00
Vlad Yasevich	3df6bf45ec	bridge: Simplify pvid checks. Currently, if the pvid is not set, we return an illegal vlan value even though the pvid value is set to 0. Since pvid of 0 is currently invalid, just return 0 instead. This makes the current and future checks simpler. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-05 21:21:36 -04:00
Vlad Yasevich	96a20d9d7f	bridge: Add a default_pvid sysfs attribute This patch allows the user to set and retrieve default_pvid value. A new value can only be stored when vlan filtering is disabled. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-05 21:21:36 -04:00
Pablo Neira Ayuso	4b7fd5d97e	netfilter: explicit module dependency between br_netfilter and physdev You can use physdev to match the physical interface enslaved to the bridge device. This information is stored in skb->nf_bridge and it is set up by br_netfilter. So, this is only available when iptables is used from the bridge netfilter path. Since `34666d4` ("netfilter: bridge: move br_netfilter out of the core"), the br_netfilter code is modular. To reduce the impact of this change, we can autoload the br_netfilter if the physdev match is used since we assume that the users need br_netfilter in place. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:30:57 +02:00
Pablo Neira Ayuso	36d2af5998	netfilter: nf_tables: allow to filter from prerouting and postrouting This allows us to emulate the NAT table in ebtables, which is actually a plain filter chain that hooks at prerouting, output and postrouting. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:30:56 +02:00
Pablo Neira Ayuso	51b0a5d8c2	netfilter: nft_reject: introduce icmp code abstraction for inet and bridge This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides an abstraction to the ICMP and ICMPv6 codes that you can use from the inet and bridge tables, they are: * NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable * NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable * NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable * NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited You can still use the specific codes when restricting the rule to match the corresponding layer 3 protocol. I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have different semantics depending on the table family and to allow the user to specify ICMP family specific codes if they restrict it to the corresponding family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-10-02 18:29:57 +02:00
Florian Fainelli	775dd692bd	net: bridge: add a br_set_state helper function In preparation for being able to propagate port states to e.g: notifiers or other kernel parts, do not manipulate the port state directly, but instead use a helper function which will allow us to do a bit more than just setting the state. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-10-01 22:03:50 -04:00
Florian Westphal	57f5877c11	netfilter: bridge: build br_nf_core only if required Eric reports build failure with CONFIG_BRIDGE_NETFILTER=n We insist to build br_nf_core.o unconditionally, but we must only do so if br_netfilter was enabled, else it fails to build due to functions being defined to empty stubs (and some structure members being defined out). Also, BRIDGE_NETFILTER=y\|m makes no sense when BRIDGE=n. Fixes: `34666d467` (netfilter: bridge: move br_netfilter out of the core) Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-30 14:07:51 -04:00
David S. Miller	852248449c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== pull request: netfilter/ipvs updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: 1) Four patches to make the new nf_tables masquerading support independent of the x_tables infrastructure. This also resolves a compilation breakage if the masquerade target is disabled but the nf_tables masq expression is enabled. 2) ipset updates via Jozsef Kadlecsik. This includes the addition of the skbinfo extension that allows you to store packet metainformation in the elements. This can be used to fetch and restore this to the packets through the iptables SET target, patches from Anton Danilov. 3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick. 4) Add simple weighted fail-over scheduler via Simon Horman. This provides a fail-over IPVS scheduler (unlike existing load balancing schedulers). Connections are directed to the appropriate server based solely on highest weight value and server availability, patch from Kenny Mathis. 5) Support IPv6 real servers in IPv4 virtual-services and vice versa. Simon Horman informs that the motivation for this is to allow more flexibility in the choice of IP version offered by both virtual-servers and real-servers as they no longer need to match: An IPv4 connection from an end-user may be forwarded to a real-server using IPv6 and vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell and Julian Anastasov. 6) Add global generation ID to the nf_tables ruleset. When dumping from several different object lists, we need a way to identify that an update has ocurred so userspace knows that it needs to refresh its lists. This also includes a new command to obtain the 32-bits generation ID. The less significant 16-bits of this ID is also exposed through res_id field in the nfnetlink header to quickly detect the interference and retry when there is no risk of ID wraparound. 7) Move br_netfilter out of the bridge core. The br_netfilter code is built in the bridge core by default. This causes problems of different kind to people that don't want this: Jesper reported performance drop due to the inconditional hook registration and I remember to have read complains on netdev from people regarding the unexpected behaviour of our bridging stack when br_netfilter is enabled (fragmentation handling, layer 3 and upper inspection). People that still need this should easily undo the damage by modprobing the new br_netfilter module. 8) Dump the set policy nf_tables that allows set parameterization. So userspace can keep user-defined preferences when saving the ruleset. From Arturo Borrero. 9) Use __seq_open_private() helper function to reduce boiler plate code in x_tables, From Rob Jones. 10) Safer default behaviour in case that you forget to load the protocol tracker. Daniel Borkmann and Florian Westphal detected that if your ruleset is stateful, you allow traffic to at least one single SCTP port and the SCTP protocol tracker is not loaded, then any SCTP traffic may be pass through unfiltered. After this patch, the connection tracking classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has been compiled with support for these modules. ==================== Trivially resolved conflict in include/linux/skbuff.h, Eric moved some netfilter skbuff members around, and the netfilter tree adjusted the ifdef guards for the bridging info pointer. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-29 14:46:53 -04:00
Pablo Neira Ayuso	34666d467c	netfilter: bridge: move br_netfilter out of the core Jesper reported that br_netfilter always registers the hooks since this is part of the bridge core. This harms performance for people that don't need this. This patch modularizes br_netfilter so it can be rmmod'ed, thus, the hooks can be unregistered. I think the bridge netfilter should have been a separated module since the beginning, Patrick agreed on that. Note that this is breaking compatibility for users that expect that bridge netfilter is going to be available after explicitly 'modprobe bridge' or via automatic load through brctl. However, the damage can be easily undone by modprobing br_netfilter. The bridge core also spots a message to provide a clue to people that didn't notice that this has been deprecated. On top of that, the plan is that nftables will not rely on this software layer, but integrate the connection tracking into the bridge layer to enable stateful filtering and NAT, which is was bridge netfilter users seem to require. This patch still keeps the fake_dst_ops in the bridge core, since this is required by when the bridge port is initialized. So we can safely modprobe/rmmod br_netfilter anytime. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>	2014-09-26 18:42:31 +02:00
Pablo Neira Ayuso	7276ca3fa2	netfilter: bridge: nf_bridge_copy_header as static inline in header Move nf_bridge_copy_header() as static inline in netfilter_bridge.h header file. This patch prepares the modularization of the br_netfilter code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-26 18:42:30 +02:00
David S. Miller	1f6d80358d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: arch/mips/net/bpf_jit.c drivers/net/can/flexcan.c Both the flexcan and MIPS bpf_jit conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-23 12:09:27 -04:00
Vlad Yasevich	c095f248e6	bridge: Fix br_should_learn to check vlan_enabled As Toshiaki Makita pointed out, the BRIDGE_INPUT_SKB_CB will not be initialized in br_should_learn() as that function is called only from br_handle_local_finish(). That is an input handler for link-local ethernet traffic so it perfectly correct to check br->vlan_enabled here. Reported-by: Toshiaki Makita<toshiaki.makita1@gmail.com> Fixes: `20adfa1` bridge: Check if vlan filtering is enabled only once. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-15 17:38:30 -04:00
Vlad Yasevich	635126b7ca	bridge: Allow clearing of pvid and untagged bitmap Currently, it is possible to modify the vlan filter configuration to add pvid or untagged support. For example: bridge vlan add vid 10 dev eth0 bridge vlan add vid 10 dev eth0 untagged pvid The second statement will modify vlan 10 to include untagged and pvid configuration. However, it is currently impossible to go backwards bridge vlan add vid 10 dev eth0 untagged pvid bridge vlan add vid 10 dev eth0 Here nothing happens. This patch correct this so that any modifiers not supplied are removed from the configuration. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 17:21:56 -04:00
Vlad Yasevich	20adfa1a81	bridge: Check if vlan filtering is enabled only once. The bridge code checks if vlan filtering is enabled on both ingress and egress. When the state flip happens, it is possible for the bridge to currently be forwarding packets and forwarding behavior becomes non-deterministic. Bridge may drop packets on some interfaces, but not others. This patch solves this by caching the filtered state of the packet into skb_cb on ingress. The skb_cb is guaranteed to not be over-written between the time packet entres bridge forwarding path and the time it leaves it. On egress, we can then check the cached state to see if we need to apply filtering information. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-13 17:21:56 -04:00
David S. Miller	0aac383353	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== nf-next pull request The following patchset contains Netfilter/IPVS updates for your net-next tree. Regarding nf_tables, most updates focus on consolidating the NAT infrastructure and adding support for masquerading. More specifically, they are: 1) use __u8 instead of u_int8_t in arptables header, from Mike Frysinger. 2) Add support to match by skb->pkttype to the meta expression, from Ana Rey. 3) Add support to match by cpu to the meta expression, also from Ana Rey. 4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from Vytas Dauksa. 5) Fix netnet and netportnet hash types the range support for IPv4, from Sergey Popovich. 6) Fix missing-field-initializer warnings resolved, from Mark Rustad. 7) Dan Carperter reported possible integer overflows in ipset, from Jozsef Kadlecsick. 8) Filter out accounting objects in nfacct by type, so you can selectively reset quotas, from Alexey Perevalov. 9) Move specific NAT IPv4 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4. 11) Move specific NAT IPv6 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6. 13) Refactor code to add nft_delrule(), which can be reused in the enhancement of the NFT_MSG_DELTABLE to remove a table and its content, from Arturo Borrero. 14) Add a helper function to unregister chain hooks, from Arturo Borrero. 15) A cleanup to rename to nft_delrule_by_chain for consistency with the new nft_*() functions, also from Arturo. 16) Add support to match devgroup to the meta expression, from Ana Rey. 17) Reduce stack usage for IPVS socket option, from Julian Anastasov. 18) Remove unnecessary textsearch state initialization in xt_string, from Bojan Prtvar. 19) Add several helper functions to nf_tables, more work to prepare the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero. 20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from Arturo Borrero. 21) Support NAT flags in the nat expression to indicate the flavour, eg. random fully, from Arturo. 22) Add missing audit code to ebtables when replacing tables, from Nicolas Dichtel. 23) Generalize the IPv4 masquerading code to allow its re-use from nf_tables, from Arturo. 24) Generalize the IPv6 masquerading code, also from Arturo. 25) Add the new masq expression to support IPv4/IPv6 masquerading from nf_tables, also from Arturo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:46:32 -07:00
Rami Rosen	5aaa62d608	bridge: Cleanup of unncessary check. This patch removes an unncessary check in the br_afspec() method of br_netlink.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:32:11 -07:00
Jiri Pirko	1332351617	bridge: implement rtnl_link_ops->changelink Allow rtnetlink users to set bridge master info via IFLA_INFO_DATA attr This initial part implements forward_delay, hello_time, max_age options. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:55 -07:00
Jiri Pirko	e5c3ea5c66	bridge: implement rtnl_link_ops->get_size and rtnl_link_ops->fill_info Allow rtnetlink users to get bridge master info in IFLA_INFO_DATA attr This initial part implements forward_delay, hello_time, max_age options. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:55 -07:00
Jiri Pirko	3ac636b859	bridge: implement rtnl_link_ops->slave_changelink Allow rtnetlink users to set port info via IFLA_INFO_SLAVE_DATA attr Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:55 -07:00
Jiri Pirko	ced8283f90	bridge: implement rtnl_link_ops->get_slave_size and rtnl_link_ops->fill_slave_info Allow rtnetlink users to get port info in IFLA_INFO_SLAVE_DATA attr Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:55 -07:00
Jiri Pirko	0f49579a39	bridge: switch order of rx_handler reg and upper dev link The thing is that netdev_master_upper_dev_link calls call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev). That generates rtnl link message and during that, rtnl_link_ops->fill_slave_info is called. But with current ordering, rx_handler and IFF_BRIDGE_PORT are not set yet so there would have to be check for that in fill_slave_info callback. Resolve this by reordering to similar what bonding and team does to avoid the check. Also add removal of IFF_BRIDGE_PORT flag into error path. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 11:29:54 -07:00
Nicolas Dichtel	c55fbbb4a7	netfilter: ebtables: create audit records for replaces This is already done for x_tables (family AF_INET and AF_INET6), let's do it for AF_BRIDGE also. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-09-09 16:31:28 +02:00
Andreea-Cristina Bernat	0932997e34	br_multicast: Replace rcu_assign_pointer() with RCU_INIT_POINTER() The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-22 12:23:11 -07:00
Vlad Yasevich	0d5501c1c8	net: Always untag vlan-tagged traffic on input. Currently the functionality to untag traffic on input resides as part of the vlan module and is build only when VLAN support is enabled in the kernel. When VLAN is disabled, the function vlan_untag() turns into a stub and doesn't really untag the packets. This seems to create an interesting interaction between VMs supporting checksum offloading and some network drivers. There are some drivers that do not allow the user to change tx-vlan-offload feature of the driver. These drivers also seem to assume that any VLAN-tagged traffic they transmit will have the vlan information in the vlan_tci and not in the vlan header already in the skb. When transmitting skbs that already have tagged data with partial checksum set, the checksum doesn't appear to be updated correctly by the card thus resulting in a failure to establish TCP connections. The following is a packet trace taken on the receiver where a sender is a VM with a VLAN configued. The host VM is running on doest not have VLAN support and the outging interface on the host is tg3: 10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294837885 ecr 0,nop,wscale 7], length 0 10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294838888 ecr 0,nop,wscale 7], length 0 This connection finally times out. I've only access to the TG3 hardware in this configuration thus have only tested this with TG3 driver. There are a lot of other drivers that do not permit user changes to vlan acceleration features, and I don't know if they all suffere from a similar issue. The patch attempt to fix this another way. It moves the vlan header stipping code out of the vlan module and always builds it into the kernel network core. This way, even if vlan is not supported on a virtualizatoin host, the virtual machines running on top of such host will still work with VLANs enabled. CC: Patrick McHardy <kaber@trash.net> CC: Nithin Nayak Sujir <nsujir@broadcom.com> CC: Michael Chan <mchan@broadcom.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-11 12:16:51 -07:00
Pablo Neira Ayuso	7926dbfa4b	netfilter: don't use mutex_lock_interruptible() Eric Dumazet reports that getsockopt() or setsockopt() sometimes returns -EINTR instead of -ENOPROTOOPT, causing headaches to application developers. This patch replaces all the mutex_lock_interruptible() by mutex_lock() in the netfilter tree, as there is no reason we should sleep for a long time there. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Julian Anastasov <ja@ssi.bg>	2014-08-08 16:47:23 +02:00
Ken Helias	1d023284c3	list: fix order of arguments for hlist_add_after(_rcu) All other add functions for lists have the new item as first argument and the position where it is added as second argument. This was changed for no good reason in this function and makes using it unnecessary confusing. The name was changed to hlist_add_behind() to cause unconverted code to generate a compile error instead of using the wrong parameter order. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Ken Helias <kenhelias@firemail.de> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [intel driver bits] Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-06 18:01:24 -07:00
David S. Miller	d247b6ab3c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/Makefile net/ipv6/sysctl_net_ipv6.c Two ipv6_table_template[] additions overlap, so the index of the ipv6_table[x] assignments needed to be adjusted. In the drivers/net/Makefile case, we've gotten rid of the garbage whereby we had to list every single USB networking driver in the top-level Makefile, there is just one "USB_NETWORKING" that guards everything. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 18:46:26 -07:00
Toshiaki Makita	fdb0a6626e	bridge: Update outdated comment on promiscuous mode Now bridge ports can be non-promiscuous, vlan_vid_add() is no longer an unnecessary operation. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-05 16:37:10 -07:00
Michael S. Tsirkin	4b7a9168e1	bridge: remove a useless comment commit `6cbdceeb1c` bridge: Dump vlan information from a bridge port introduced a comment in an attempt to explain the code logic. The comment is unfinished so it confuses more than it explains, remove it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-08-04 12:46:51 -07:00
Toshiaki Makita	47fab41ab5	bridge: Don't include NDA_VLAN for FDB entries with vid 0 An FDB entry with vlan_id 0 doesn't mean it is used in vlan 0, but used when vlan_filtering is disabled. There is inconsistency around NDA_VLAN whose payload is 0 - even if we add an entry by RTM_NEWNEIGH without any NDA_VLAN, and even though adding an entry with NDA_VLAN 0 is prohibited, we get an entry with NDA_VLAN 0 by RTM_GETNEIGH. Dumping an FDB entry with vlan_id 0 shouldn't include NDA_VLAN. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-31 12:18:44 -07:00
Paul Bolle	d4da843e6f	netfilter: kill remnants of ulog targets The ulog targets were recently killed. A few references to the Kconfig macros CONFIG_IP_NF_TARGET_ULOG and CONFIG_BRIDGE_EBT_ULOG were left untouched. Kill these too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-25 14:55:44 +02:00
Pablo Neira Ayuso	85f5b3086a	netfilter: bridge: add reject support So you can reject IPv4 and IPv6 packets from bridge tables. If the ether proto is now known, default on dropping the packet instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-07-22 12:00:22 +02:00
David S. Miller	a8138f42d4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains updates for your net-next tree, they are: 1) Use kvfree() helper function from x_tables, from Eric Dumazet. 2) Remove extra timer from the conntrack ecache extension, use a workqueue instead to redeliver lost events to userspace instead, from Florian Westphal. 3) Removal of the ulog targets for ebtables and iptables. The nflog infrastructure superseded this almost 9 years ago, time to get rid of this code. 4) Replace the list of loggers by an array now that we can only have two possible non-overlapping logger flavours, ie. kernel ring buffer and netlink logging. 5) Move Eric Dumazet's log buffer code to nf_log to reuse it from all of the supported per-family loggers. 6) Consolidate nf_log_packet() as an unified interface for packet logging. After this patch, if the struct nf_loginfo is available, it explicitly selects the logger that is used. 7) Move ip and ip6 logging code from xt_LOG to the corresponding per-family loggers. Thus, x_tables and nf_tables share the same code for packet logging. 8) Add generic ARP packet logger, which is used by nf_tables. The format aims to be consistent with the output of xt_LOG. 9) Add generic bridge packet logger. Again, this is used by nf_tables and it routes the packets to the real family loggers. As a result, we get consistent logging format for the bridge family. The ebt_log logging code has been intentionally left in place not to break backward compatibility since the logging output differs from xt_LOG. 10) Update nft_log to explicitly request the required family logger when needed. 11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families. Allowing selection between netlink and kernel buffer ring logging. 12) Several fixes coming after the netfilter core logging changes spotted by robots. 13) Use IS_ENABLED() macros whenever possible in the netfilter tree, from Duan Jiong. 14) Removal of a couple of unnecessary branch before kfree, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-20 21:01:43 -07:00
Tom Gundersen	c835a67733	net: set name_assign_type in alloc_netdev() Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) \| -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) \| -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-15 16:12:48 -07:00
Jamal Hadi Salim	5e6d243587	bridge: netlink dump interface at par with brctl Actually better than brctl showmacs because we can filter by bridge port in the kernel. The current bridge netlink interface doesnt scale when you have many bridges each with large fdbs or even bridges with many bridge ports And now for the science non-fiction novel you have all been waiting for.. //lets see what bridge ports we have root@moja-1:/configs/may30-iprt/bridge# ./bridge link show 8: eth1 state DOWN : <BROADCAST,MULTICAST> mtu 1500 master br0 state disabled priority 32 cost 19 17: sw1-p1 state DOWN : <BROADCAST,NOARP> mtu 1500 master br0 state disabled priority 32 cost 100 // show all.. root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show 33:33:00:00:00:01 dev bond0 self permanent 33:33:00:00:00:01 dev dummy0 self permanent 33:33:00:00:00:01 dev ifb0 self permanent 33:33:00:00:00:01 dev ifb1 self permanent 33:33:00:00:00:01 dev eth0 self permanent 01:00:5e:00:00:01 dev eth0 self permanent 33:33:ff:22:01:01 dev eth0 self permanent 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:07 dev eth1 self permanent 33:33:00:00:00:01 dev eth1 self permanent 33:33:00:00:00:01 dev gretap0 self permanent da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent 33:33:00:00:00:01 dev sw1-p1 self permanent //filter by bridge root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br br0 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:07 dev eth1 self permanent 33:33:00:00:00:01 dev eth1 self permanent da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent 33:33:00:00:00:01 dev sw1-p1 self permanent // bridge sw1 has no ports attached.. root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br sw1 //filter by port root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show brport eth1 02:00:00:12:01:02 vlan 0 master br0 permanent 00:17:42:8a:b4:05 vlan 0 master br0 permanent 00:17:42:8a:b4:07 self permanent 33:33:00:00:00:01 self permanent // filter by port + bridge root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br br0 brport sw1-p1 da:ac:46:27:d9:53 vlan 0 master br0 permanent 33:33:00:00:00:01 self permanent // for shits and giggles (as they say in New Brunswick), lets // change the mac that br0 uses // Note: a magical fdb entry with no brport is added ... root@moja-1:/configs/may30-iprt/bridge# ip link set dev br0 address 02:00:00:12:01:04 // lets see if we can see the unicorn .. root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show 33:33:00:00:00:01 dev bond0 self permanent 33:33:00:00:00:01 dev dummy0 self permanent 33:33:00:00:00:01 dev ifb0 self permanent 33:33:00:00:00:01 dev ifb1 self permanent 33:33:00:00:00:01 dev eth0 self permanent 01:00:5e:00:00:01 dev eth0 self permanent 33:33:ff:22:01:01 dev eth0 self permanent 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:07 dev eth1 self permanent 33:33:00:00:00:01 dev eth1 self permanent 33:33:00:00:00:01 dev gretap0 self permanent 02:00:00:12:01:04 dev br0 vlan 0 master br0 permanent <=== there it is da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent 33:33:00:00:00:01 dev sw1-p1 self permanent //can we see it if we filter by bridge? root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br br0 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent 00:17:42:8a:b4:07 dev eth1 self permanent 33:33:00:00:00:01 dev eth1 self permanent 02:00:00:12:01:04 dev br0 vlan 0 master br0 permanent <=== there it is da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent 33:33:00:00:00:01 dev sw1-p1 self permanent Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-10 12:37:33 -07:00
Jamal Hadi Salim	5d5eacb34c	bridge: fdb dumping takes a filter device Dumping a bridge fdb dumps every fdb entry held. With this change we are going to filter on selected bridge port. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-10 12:37:33 -07:00
Linus Lüssing	c34963e216	bridge: export knowledge about the presence of IGMP/MLD queriers With this patch other modules are able to ask the bridge whether an IGMP or MLD querier exists on the according, bridged link layer. Multicast snooping can only be performed if a valid, selected querier exists on a link. Just like the bridge only enables its multicast snooping if a querier exists, e.g. batman-adv too can only activate its multicast snooping in bridged scenarios if a querier is present. For instance this export avoids having to reimplement IGMP/MLD querier message snooping and parsing in e.g. batman-adv, when multicast optimizations for bridged scenarios are added in the future. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-07-08 16:12:53 -07:00
Pablo Neira Ayuso	960649d192	netfilter: bridge: add generic packet logger This adds the generic plain text packet loggger for bridged packets. It routes the logging message to the real protocol packet logger. I decided not to refactor the ebt_log code for two reasons: 1) The ebt_log output is not consistent with the IPv4 and IPv6 Netfilter packet loggers. The output is different for no good reason and it adds redundant code to handle packet logging. 2) To avoid breaking backward compatibility for applications outthere that are parsing the specific ebt_log output, the ebt_log output has been left as is. So only nftables will use the new consistent logging format for logged bridged packets. More decisions coming in this patch: 1) This also removes ebt_log as default logger for bridged packets. Thus, nf_log_packet() routes packet to this new packet logger instead. This doesn't break backward compatibility since nf_log_packet() is not used to log packets in plain text format from anywhere in the ebtables/netfilter bridge code. 2) The new bridge packet logger also performs a lazy request to register the real IPv4, ARP and IPv6 netfilter packet loggers. If the real protocol logger is no available (not compiled or the module is not available in the system, not packet logging happens. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-27 13:20:47 +02:00
Pablo Neira Ayuso	5962815a6a	netfilter: nf_log: use an array of loggers instead of list Now that legacy ulog targets are not available anymore in the tree, we can have up to two possible loggers: 1) The plain text logging via kernel logging ring. 2) The nfnetlink_log infrastructure which delivers log messages to userspace. This patch replaces the list of loggers by an array of two pointers per family for each possible logger and it also introduces a new field to the nf_logger structure which indicates the position in the logger array (based on the logger type). This prepares a follow up patch that consolidates the nf_log_packet() interface by allowing to specify the logger as parameter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 19:28:43 +02:00
Pablo Neira Ayuso	7200135bc1	netfilter: kill ulog targets This has been marked as deprecated for quite some time and the NFLOG target replacement has been also available since 2006. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-06-25 19:28:43 +02:00
Li RongQing	a3f5ee71cd	bridge: use list_for_each_entry_continue_reverse use list_for_each_entry_continue_reverse to rollback in fdb_add_hw when add address failed Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-21 15:33:22 -07:00
Linus Lüssing	3993c4e159	bridge: fix compile error when compiling without IPv6 support Some fields in "struct net_bridge" aren't available when compiling the kernel without IPv6 support. Therefore adding a check/macro to skip the complaining code sections in that case. Introduced by `2cd4143192` ("bridge: memorize and export selected IGMP/MLD querier port") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-12 11:00:24 -07:00
Linus Lüssing	6c03ee8bda	bridge: fix smatch warning / potential null pointer dereference "New smatch warnings: net/bridge/br_multicast.c:1368 br_ip6_multicast_query() error: we previously assumed 'group' could be null (see line 1349)" In the rare (sort of broken) case of a query having a Maximum Response Delay of zero, we could create a potential null pointer dereference. Fixing this by skipping the multicast specific MLD Query parsing again if no multicast group address is available. Introduced by `dc4eb53a99` ("bridge: adhere to querier election mechanism specified by RFCs") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-12 11:00:24 -07:00
Toshiaki Makita	204177f3f3	bridge: Support 802.1ad vlan filtering This enables us to change the vlan protocol for vlan filtering. We come to be able to filter frames on the basis of 802.1ad vlan tags through a bridge. This also changes br->group_addr if it has not been set by user. This is needed for an 802.1ad bridge. (See IEEE 802.1Q-2011 8.13.5.) Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad bridge can forward the Nearest Customer Bridge group addresses except for br->group_addr, which should be passed to higher layer. To change the vlan protocol, write a protocol in sysfs: # echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Toshiaki Makita	f2808d226f	bridge: Prepare for forwarding another bridge group addresses If a bridge is an 802.1ad bridge, it must forward another bridge group addresses (the Nearest Customer Bridge group addresses). (For details, see IEEE 802.1Q-2011 8.6.3.) As user might not want group_fwd_mask to be modified by enabling 802.1ad, introduce a new mask, group_fwd_mask_required, which indicates addresses the bridge wants to forward. This will be set by enabling 802.1ad. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Toshiaki Makita	8580e2117c	bridge: Prepare for 802.1ad vlan filtering support This enables a bridge to have vlan protocol informantion and allows vlan tag manipulation (retrieve, insert and remove tags) according to the vlan protocol. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Toshiaki Makita	1c5abb6c77	bridge: Add 802.1ad tx vlan acceleration Bridge device doesn't need to embed S-tag into skb->data. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:22:53 -07:00
Linus Lüssing	2cd4143192	bridge: memorize and export selected IGMP/MLD querier port Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in IGMP/MLD queriers to be able to reliably serve any multicast listener behind this same bridge. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 23:50:47 -07:00
Linus Lüssing	07f8ac4a1e	bridge: add export of multicast database adjacent to net_dev With this new, exported function br_multicast_list_adjacent(net_dev) a list of IPv4/6 addresses is returned. This list contains all multicast addresses sensed by the bridge multicast snooping feature on all bridge ports of the bridge interface of net_dev, excluding addresses from the specified net_device itself. Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in multicast listeners to be able to reliably serve them with multicast packets. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 23:50:47 -07:00
Linus Lüssing	dc4eb53a99	bridge: adhere to querier election mechanism specified by RFCs MLDv1 (RFC2710 section 6), MLDv2 (RFC3810 section 7.6.2), IGMPv2 (RFC2236 section 3) and IGMPv3 (RFC3376 section 6.6.2) specify that the querier with lowest source address shall become the selected querier. So far the bridge stopped its querier as soon as it heard another querier regardless of its source address. This results in the "wrong" querier potentially becoming the active querier or a potential, unnecessary querying delay. With this patch the bridge memorizes the source address of the currently selected querier and ignores queries from queriers with a higher source address than the currently selected one. This slight optimization is supposed to make it more RFC compliant (but is rather uncritical and therefore probably not necessary to be queued for stable kernels). Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 23:50:47 -07:00
Linus Lüssing	90010b36eb	bridge: rename struct bridge_mcast_query/querier The current naming of these two structs is very random, in that reversing their naming would not make any semantical difference. This patch tries to make the naming less confusing by giving them a more specific, distinguishable naming. This is also useful for the upcoming patches reintroducing the "struct bridge_mcast_querier" but for storing information about the selected querier (no matter if our own or a foreign querier). Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-10 23:50:46 -07:00
Toshiaki Makita	e0a47d1f78	bridge: Fix incorrect judgment of promisc br_manage_promisc() incorrectly expects br_auto_port() to return only 0 or 1, while it actually returns flags, i.e., a subset of BR_AUTO_MASK. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-05 15:20:31 -07:00
David S. Miller	c99f7abf0e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-03 23:32:12 -07:00
Roopa Prabhu	41c389d72c	bridge: Add bridge ifindex to bridge fdb notify msgs (This patch was previously posted as RFC at http://patchwork.ozlabs.org/patch/352677/) This patch adds NDA_MASTER attribute to neighbour attributes enum for bridge/master ifindex. And adds NDA_MASTER to bridge fdb notify msgs. Today bridge fdb notifications dont contain bridge information. Userspace can derive it from the port information in the fdb notification. However this is tricky in some scenarious. Example, bridge port delete notification comes before bridge fdb delete notifications. And we have seen problems in userspace when using libnl where, the bridge fdb delete notification handling code does not understand which bridge this fdb entry is part of because the bridge and port association has already been deleted. And these notifications (port membership and fdb) are generated on separate rtnl groups. Fixing the order of notifications could possibly solve the problem for some cases (I can submit a separate patch for that). This patch chooses to add NDA_MASTER to bridge fdb notify msgs because it not only solves the problem described above, but also helps userspace avoid another lookup into link msgs to derive the master index. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 17:58:55 -07:00
Toshiaki Makita	e0d7968ab6	bridge: Prevent insertion of FDB entry with disallowed vlan br_handle_local_finish() is allowing us to insert an FDB entry with disallowed vlan. For example, when port 1 and 2 are communicating in vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can interfere with their communication by spoofed src mac address with vlan id 10. Note: Even if it is judged that a frame should not be learned, it should not be dropped because it is destined for not forwarding layer but higher layer. See IEEE 802.1Q-2011 8.13.10. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-02 13:38:23 -07:00
Jon Maxwell	c65c7a3066	bridge: notify user space after fdb update There has been a number incidents recently where customers running KVM have reported that VM hosts on different Hypervisors are unreachable. Based on pcap traces we found that the bridge was broadcasting the ARP request out onto the network. However some NICs have an inbuilt switch which on occasions were broadcasting the VMs ARP request back through the physical NIC on the Hypervisor. This resulted in the bridge changing ports and incorrectly learning that the VMs mac address was external. As a result the ARP reply was directed back onto the external network and VM never updated it's ARP cache. This patch will notify the bridge command, after a fdb has been updated to identify such port toggling. Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-01 22:14:50 -07:00
wangweidong	019ee792d7	bridge: fix the unbalanced promiscuous count when add_if failed As commit `2796d0c648` ("bridge: Automatically manage port promiscuous mode."), make the add_if use dev_set_allmulti instead of dev_set_promiscuous, so when add_if failed, we should do dev_set_allmulti(dev, -1). Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Reviewed-by: Amos Kong <akong@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-01 22:05:16 -07:00
Pablo Neira	1708803ef2	netfilter: bridge: fix Kconfig unmet dependencies Before `f5efc69` ("netfilter: nf_tables: Add meta expression key for bridge interface name"), the entire net/bridge/netfilter/ directory depended on BRIDGE_NF_EBTABLES, ie. on ebtables. However, that directory already contained the nf_tables bridge extension that we should allow to compile separately. In `f5efc69`, we tried to generalize this by using CONFIG_BRIDGE_NETFILTER which was not a good idea since this option already existed and it is dedicated to enable the Netfilter bridge IP/ARP filtering. Let's try to fix this mess by: 1) making net/bridge/netfilter/ dependent on the toplevel CONFIG_NETFILTER option, just like we do with the net/netfilter and net/ipv{4,6}/netfilter/ directories. 2) Changing 'selects' to 'depends on' NETFILTER_XTABLES for BRIDGE_NF_EBTABLES. I believe this problem was already before `f5efc69`: warning: (BRIDGE_NF_EBTABLES) selects NETFILTER_XTABLES which has unmet direct dependencies (NET && INET && NETFILTER) 3) Fix ebtables/nf_tables bridge dependencies by making NF_TABLES_BRIDGE and BRIDGE_NF_EBTABLES dependent on BRIDGE and NETFILTER: warning: (NF_TABLES_BRIDGE && BRIDGE_NF_EBTABLES) selects BRIDGE_NETFILTER which has unmet direct dependencies (NET && BRIDGE && NETFILTER && INET && NETFILTER_ADVANCED) net/built-in.o: In function `br_parse_ip_options': br_netfilter.c:(.text+0x4a5ba): undefined reference to `ip_options_compile' br_netfilter.c:(.text+0x4a5ed): undefined reference to `ip_options_rcv_srr' net/built-in.o: In function `br_nf_pre_routing_finish': br_netfilter.c:(.text+0x4a8a4): undefined reference to `ip_route_input_noref' br_netfilter.c:(.text+0x4a987): undefined reference to `ip_route_output_flow' make: *** [vmlinux] Error 1 Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-26 00:42:30 -04:00
Cong Wang	b1282726d5	bridge: make br_device_notifier static Merge net/bridge/br_notify.c into net/bridge/br.c, since it has only br_device_event() and br.c is small. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 15:33:47 -04:00
David S. Miller	8af750d739	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== Netfilter/nftables updates for net-next The following patchset contains Netfilter/nftables updates for net-next, most relevantly they are: 1) Add set element update notification via netlink, from Arturo Borrero. 2) Put all object updates in one single message batch that is sent to kernel-space. Before this patch only rules where included in the batch. This series also introduces the generic transaction infrastructure so updates to all objects (tables, chains, rules and sets) are applied in an all-or-nothing fashion, these series from me. 3) Defer release of objects via call_rcu to reduce the time required to commit changes. The assumption is that all objects are destroyed in reverse order to ensure that dependencies betweem them are fulfilled (ie. rules and sets are destroyed first, then chains, and finally tables). 4) Allow to match by bridge port name, from Tomasz Bursztyka. This series include two patches to prepare this new feature. 5) Implement the proper set selection based on the characteristics of the data. The new infrastructure also allows you to specify your preferences in terms of memory and computational complexity so the underlying set type is also selected according to your needs, from Patrick McHardy. 6) Several cleanup patches for nft expressions, including one minor possible compilation breakage due to missing mark support, also from Patrick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-22 12:06:23 -04:00
stephen hemminger	025559eec8	bridge: fix spelling of promiscuous Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-18 21:10:08 -04:00
Alexei Starovoitov	d4f0e0958d	net: bridge: fix build fix build when BRIDGE_VLAN_FILTERING is not set Fixes: `2796d0c648` ("bridge: Automatically manage port promiscuous mode") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-18 20:09:50 -04:00
Vlad Yasevich	2796d0c648	bridge: Automatically manage port promiscuous mode. There exist configurations where the administrator or another management entity has the foreknowledge of all the mac addresses of end systems that are being bridged together. In these environments, the administrator can statically configure known addresses in the bridge FDB and disable flooding and learning on ports. This makes it possible to turn off promiscuous mode on the interfaces connected to the bridge. Here is why disabling flooding and learning allows us to control promiscuity: Consider port X. All traffic coming into this port from outside the bridge (ingress) will be either forwarded through other ports of the bridge (egress) or dropped. Forwarding (egress) is defined by FDB entries and by flooding in the event that no FDB entry exists. In the event that flooding is disabled, only FDB entries define the egress. Once learning is disabled, only static FDB entries provided by a management entity define the egress. If we provide information from these static FDBs to the ingress port X, then we'll be able to accept all traffic that can be successfully forwarded and drop all the other traffic sooner without spending CPU cycles to process it. Another way to define the above is as following equations: ingress = egress + drop expanding egress ingress = static FDB + learned FDB + flooding + drop disabling flooding and learning we a left with ingress = static FDB + drop By adding addresses from the static FDB entries to the MAC address filter of an ingress port X, we fully define what the bridge can process without dropping and can thus turn off promiscuous mode, thus dropping packets sooner. There have been suggestions that we may want to allow learning and update the filters with learned addresses as well. This would require mac-level authentication similar to 802.1x to prevent attacks against the hw filters as they are limited resource. Additionally, if the user places the bridge device in promiscuous mode, all ports are placed in promiscuous mode regardless of the changes to flooding and learning. Since the above functionality depends on full static configuration, we have also require that vlan filtering be enabled to take advantage of this. The reason is that the bridge has to be able to receive and process VLAN-tagged frames and the there are only 2 ways to accomplish this right now: promiscuous mode or vlan filtering. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	145beee8d6	bridge: Add addresses from static fdbs to non-promisc ports When a static fdb entry is created, add the mac address from this fdb entry to any ports that are currently running in non-promiscuous mode. These ports need this data so that they can receive traffic destined to these addresses. By default ports start in promiscuous mode, so this feature is disabled. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	f3a6ddf152	bridge: Introduce BR_PROMISC flag Introduce a BR_PROMISC per-port flag that will help us track if the current port is supposed to be in promiscuous mode or not. For now, always start in promiscuous mode. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	8db24af71b	bridge: Add functionality to sync static fdb entries to hw Add code that allows static fdb entires to be synced to the hw list for a specified port. This will be used later to program ports that can function in non-promiscuous mode. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	e028e4b8dc	bridge: Keep track of ports capable of automatic discovery. By default, ports on the bridge are capable of automatic discovery of nodes located behind the port. This is accomplished via flooding of unknown traffic (BR_FLOOD) and learning the mac addresses from these packets (BR_LEARNING). If the above functionality is disabled by turning off these flags, the port requires static configuration in the form of static FDB entries to function properly. This patch adds functionality to keep track of all ports capable of automatic discovery. This will later be used to control promiscuity settings. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:33 -04:00
Vlad Yasevich	63c3a622dd	bridge: Turn flag change macro into a function. Turn the flag change macro into a function to allow easier updates and to reduce space. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-16 17:06:32 -04:00
Wilfried Klaebe	7ad24ea4bf	net: get rid of SET_ETHTOOL_OPS net: get rid of SET_ETHTOOL_OPS Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone. This does that. Mostly done via coccinelle script: @@ struct ethtool_ops ops; struct net_device dev; @@ - SET_ETHTOOL_OPS(dev, ops); + dev->ethtool_ops = ops; Compile tested only, but I'd seriously wonder if this broke anything. Suggested-by: Dave Miller <davem@davemloft.net> Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-13 17:43:20 -04:00
dingtianhong	f06c7f9f92	vlan: rename __vlan_find_dev_deep() to __vlan_find_dev_deep_rcu() The __vlan_find_dev_deep should always called in RCU, according David's suggestion, rename to __vlan_find_dev_deep_rcu looks more reasonable. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-05-12 14:39:13 -04:00
Vasily Averin	aff09ce303	bridge: superfluous skb->nfct check in br_nf_dev_queue_xmit Currently bridge can silently drop ipv4 fragments. If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4, br_nf_pre_routing defragments incoming ipv4 fragments but nfct check in br_nf_dev_queue_xmit does not allow re-fragment combined packet back, and therefore it is dropped in br_dev_queue_push_xmit without incrementing of any failcounters It seems the only way to hit the ip_fragment code in the bridge xmit path is to have a fragment list whose reassembled fragments go over the mtu. This only happens if nf_defrag is enabled. Thanks to Florian Westphal for providing feedback to clarify this. Defragmentation ipv4 is required not only in conntracks but at least in TPROXY target and socket match, therefore #ifdef is changed from NF_CONNTRACK_IPV4 to NF_DEFRAG_IPV4 Signed-off-by: Vasily Averin <vvs@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-05-05 16:05:43 +02:00
Toshiaki Makita	30313a3d57	bridge: Handle IFLA_ADDRESS correctly when creating bridge device When bridge device is created with IFLA_ADDRESS, we are not calling br_stp_change_bridge_id(), which leads to incorrect local fdb management and bridge id calculation, and prevents us from receiving frames on the bridge device. Reported-by: Tom Gundersen <teg@jklm.no> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-27 19:46:17 -04:00
Tomasz Bursztyka	f5efc696cc	netfilter: nf_tables: Add meta expression key for bridge interface name NFT_META_BRI_IIFNAME to get packet input bridge interface name NFT_META_BRI_OIFNAME to get packet output bridge interface name Such meta key are accessible only through NFPROTO_BRIDGE family, on a dedicated nft meta module: nft_meta_bridge. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-24 10:37:28 +02:00
Toshiaki Makita	eb7076182d	bridge: Fix double free and memory leak around br_allowed_ingress br_allowed_ingress() has two problems. 1. If br_allowed_ingress() is called by br_handle_frame_finish() and vlan_untag() in br_allowed_ingress() fails, skb will be freed by both vlan_untag() and br_handle_frame_finish(). 2. If br_allowed_ingress() is called by br_dev_xmit() and br_allowed_ingress() fails, the skb will not be freed. Fix these two problems by freeing the skb in br_allowed_ingress() if it fails. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 15:12:47 -04:00
Thomas Graf	c58dd2dd44	netfilter: Can't fail and free after table replacement All xtables variants suffer from the defect that the copy_to_user() to copy the counters to user memory may fail after the table has already been exchanged and thus exposed. Return an error at this point will result in freeing the already exposed table. Any subsequent packet processing will result in a kernel panic. We can't copy the counters before exposing the new tables as we want provide the counter state after the old table has been unhooked. Therefore convert this into a silent error. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-04-05 17:46:22 +02:00
Vlad Yasevich	f6367b4660	bridge: use is_skb_forwardable in forward path Use existing function instead of trying to use our own. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-31 16:04:04 -04:00
David S. Miller	64c27237a0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/marvell/mvneta.c The mvneta.c conflict is a case of overlapping changes, a conversion to devm_ioremap_resource() vs. a conversion to netdev_alloc_pcpu_stats. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-29 18:48:54 -04:00
Eric W. Biederman	a8779ec1c5	netpoll: Remove gfp parameter from __netpoll_setup The gfp parameter was added in: commit `47be03a28c` Author: Amerigo Wang <amwang@redhat.com> Date: Fri Aug 10 01:24:37 2012 +0000 netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> The reason for the gfp parameter was removed in: commit `c4cdef9b71` Author: dingtianhong <dingtianhong@huawei.com> Date: Tue Jul 23 15:25:27 2013 +0800 bonding: don't call slave_xxx_netpoll under spinlocks The slave_xxx_netpoll will call synchronize_rcu_bh(), so the function may schedule and sleep, it should't be called under spinlocks. bond_netpoll_setup() and bond_netpoll_cleanup() are always protected by rtnl lock, it is no need to take the read lock, as the slave list couldn't be changed outside rtnl lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Nothing else that calls __netpoll_setup or ndo_netpoll_setup requires a gfp paramter, so remove the gfp parameter from both of these functions making the code clearer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-29 17:58:37 -04:00
Vlad Yasevich	fc92f745f8	bridge: Fix crash with vlan filtering and tcpdump When the vlan filtering is enabled on the bridge, but the filter is not configured on the bridge device itself, running tcpdump on the bridge device will result in a an Oops with NULL pointer dereference. The reason is that br_pass_frame_up() will bypass the vlan check because promisc flag is set. It will then try to get the table pointer and process the packet based on the table. Since the table pointer is NULL, we oops. Catch this special condition in br_handle_vlan(). Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-28 17:14:02 -04:00
Toshiaki Makita	99b192da9c	bridge: Fix handling stacked vlan tags If a bridge with vlan_filtering enabled receives frames with stacked vlan tags, i.e., they have two vlan tags, br_vlan_untag() strips not only the outer tag but also the inner tag. br_vlan_untag() is called only from br_handle_vlan(), and in this case, it is enough to set skb->vlan_tci to 0 here, because vlan_tci has already been set before calling br_handle_vlan(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-28 16:33:09 -04:00
Toshiaki Makita	12464bb8de	bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled Bridge vlan code (br_vlan_get_tag()) assumes that all frames have vlan_tci if they are tagged, but if vlan tx offload is manually disabled on bridge device and frames are sent from vlan device on the bridge device, the tags are embedded in skb->data and they break this assumption. Extract embedded vlan tags and move them to vlan_tci at ingress. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-28 16:33:09 -04:00
Monam Agarwal	cd18721e52	net/bridge: Use RCU_INIT_POINTER(x, NULL) in br_vlan.c This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL) The rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. And in the case of the NULL pointer, there is no structure to initialize. So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL) Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-27 00:18:09 -04:00
Eric W. Biederman	57a7744e09	net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-14 22:41:36 -04:00
David S. Miller	85dcce7a73	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/r8152.c drivers/net/xen-netback/netback.c Both the r8152 and netback conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-14 22:31:55 -04:00
Linus Lüssing	20a599bec9	bridge: multicast: enable snooping on general queries only Without this check someone could easily create a denial of service by injecting multicast-specific queries to enable the bridge snooping part if no real querier issuing periodic general queries is present on the link which would result in the bridge wrongly shutting down ports for multicast traffic as the bridge did not learn about these listeners. With this patch the snooping code is enabled upon receiving valid, general queries only. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-11 23:22:10 -04:00
Linus Lüssing	9ed973cc40	bridge: multicast: add sanity check for general query destination General IGMP and MLD queries are supposed to have the multicast link-local all-nodes address as their destination according to RFC2236 section 9, RFC3376 section 4.1.12/9.1, RFC2710 section 8 and RFC3810 section 5.1.15. Without this check, such malformed IGMP/MLD queries can result in a denial of service: The queries are ignored by most IGMP/MLD listeners therefore they will not respond with an IGMP/MLD report. However, without this patch these malformed MLD queries would enable the snooping part in the bridge code, potentially shutting down the according ports towards these hosts for multicast traffic as the bridge did not learn about these listeners. Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-11 23:22:10 -04:00
Hannes Frederic Sowa	e90c14835b	inet: remove now unused flag DST_NOPEER Commit `e688a60480` ("net: introduce DST_NOPEER dst flag") introduced DST_NOPEER because because of crashes in ipv6_select_ident called from udp6_ufo_fragment. Since commit `916e4cf46d` ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") we don't call ipv6_select_ident any more from ip6_ufo_append_data, thus this flag lost its purpose and can be removed. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-06 13:15:52 -05:00
Linus Lüssing	6565b9eeef	bridge: multicast: add sanity check for query source addresses MLD queries are supposed to have an IPv6 link-local source address according to RFC2710, section 4 and RFC3810, section 5.1.14. This patch adds a sanity check to ignore such broken MLD queries. Without this check, such malformed MLD queries can result in a denial of service: The queries are ignored by any MLD listener therefore they will not respond with an MLD report. However, without this patch these malformed MLD queries would enable the snooping part in the bridge code, potentially shutting down the according ports towards these hosts for multicast traffic as the bridge did not learn about these listeners. Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-03-05 20:40:24 -05:00
Joe Perches	0409114282	bridge: netfilter: Use ether_addr_copy Convert the uses of memcpy to ether_addr_copy because for some architectures it is smaller and faster. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-24 19:16:44 -05:00
Joe Perches	e5a727f663	bridge: Use ether_addr_copy and ETH_ALEN Convert the more obvious uses of memcpy to ether_addr_copy. There are still uses of memcpy that could be converted but these addresses are __aligned(2). Convert a couple uses of 6 in gr_private.h to ETH_ALEN. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-24 19:16:43 -05:00
WANG Cong	1c213bd24a	net: introduce netdev_alloc_pcpu_stats() for drivers There are many drivers calling alloc_percpu() to allocate pcpu stats and then initializing ->syncp. So just introduce a helper function for them. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-14 15:49:55 -05:00
Toshiaki Makita	ac4c886883	bridge: Prevent possible race condition in br_fdb_change_mac_address br_fdb_change_mac_address() calls fdb_insert()/fdb_delete() without br->hash_lock. These hash list updates are racy with br_fdb_update()/br_fdb_cleanup(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:34 -08:00
Toshiaki Makita	424bb9c97c	bridge: Properly check if local fdb entry can be deleted when deleting vlan Vlan codes unconditionally delete local fdb entries. We should consider the possibility that other ports have the same address and vlan. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab bridge vlan add dev eth0 vid 10 bridge vlan add dev eth1 vid 10 bridge vlan add dev br0 vid 10 self We will have fdb entry such that f->dst == eth0, f->vlan_id == 10 and f->addr == 12:34:56:78:90:ab at this time. Next, delete eth0 vlan 10. bridge vlan del dev eth0 vid 10 In this case, we still need the entry for br0, but it will be deleted. Note that br0 needs the entry even though its mac address is not set manually. To delete the entry with proper condition checking, fdb_delete_local() is suitable to use. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:34 -08:00
Toshiaki Makita	a778e6d1a5	bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port br_fdb_delete_by_port() doesn't care about vlan and mac address of the bridge device. As the check is almost the same as mac address changing, slightly modify fdb_delete_local() and use it. Note that we can always set added_by_user to 0 in fdb_delete_local() because - br_fdb_delete_by_port() calls fdb_delete_local() for local entries regardless of its added_by_user. In this case, we have to check if another port has the same address and vlan, and if found, we have to create the entry (by changing dst). This is kernel-added entry, not user-added. - br_fdb_changeaddr() doesn't call fdb_delete_local() for user-added entry. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:34 -08:00
Toshiaki Makita	960b589f86	bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address br_fdb_change_mac_address() doesn't check if the local entry has the same address as any of bridge ports. Although I'm not sure when it is beneficial, current implementation allow the bridge device to receive any mac address of its ports. To preserve this behavior, we have to check if the mac address of the entry being deleted is identical to that of any port. As this check is almost the same as that in br_fdb_changeaddr(), create a common function fdb_delete_local() and call it from br_fdb_changeadddr() and br_fdb_change_mac_address(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	2b292fb4a5	bridge: Fix the way to check if a local fdb entry can be deleted We should take into account the followings when deleting a local fdb entry. - nbp_vlan_find() can be used only when vid != 0 to check if an entry is deletable, because a fdb entry with vid 0 can exist at any time while nbp_vlan_find() always return false with vid 0. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address 12:34:56:78:90:ab brctl addif br0 eth0 brctl addif br0 eth1 ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the fdb entry 12:34:56:78:90:ab will be deleted even though the bridge port eth1 still has that address. - The port to which the bridge device is attached might needs a local entry if its mac address is set manually. Example of problematic case: ip link set eth0 address 12:34:56:78:90:ab brctl addif br0 eth0 ip link set br0 address 12:34:56:78:90:ab ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the fdb still must have the entry 12:34:56:78:90:ab, but it will be deleted. We can use br->dev->addr_assign_type to check if the address is manually set or not, but I propose another approach. Since we delete and insert local entries whenever changing mac address of the bridge device, we can change dst of the entry to NULL regardless of addr_assign_type when deleting an entry associated with a certain port, and if it is found to be unnecessary later, then delete it. That is, if changing mac address of a port, the entry might be changed to its dst being NULL first, but is eventually deleted when recalculating and changing bridge id. This approach is especially useful when we want to share the code with deleting vlan in which the bridge device might want such an entry regardless of addr_assign_type, and makes things easy because we don't have to consider if mac address of the bridge device will be changed or not at the time we delete a local entry of a port, which means fdb code will not be bothered even if the bridge id calculating logic is changed in the future. Also, this change reduces inconsistent state, where frames whose dst is the mac address of the bridge, can't reach the bridge because of premature fdb entry deletion. This change reduces the possibility that the bridge device replies unreachable mac address to arp requests, which could occur during the short window between calling del_nbp() and br_stp_recalculate_bridge_id() in br_del_if(). This will effective after br_fdb_delete_by_port() starts to use the same code by following patch. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	a4b816d8ba	bridge: Change local fdb entries whenever mac address of bridge device changes Vlan code may need fdb change when changing mac address of bridge device even if it is caused by the mac address changing of a bridge port. Example configuration: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # br0 will have mac address 12:34:56:78:90:ab bridge vlan add dev br0 vid 10 self bridge vlan add dev eth0 vid 10 We will have fdb entry such that f->dst == NULL, f->vlan_id == 10 and f->addr == 12:34:56:78:90:ab at this time. Next, change the mac address of eth0 to greater value. ip link set eth0 address ee:ff:12:34:56:78 Then, mac address of br0 will be recalculated and set to aa:bb:cc:dd:ee:ff. However, an entry aa:bb:cc:dd:ee:ff will not be created and we will be not able to communicate using br0 on vlan 10. Address this issue by deleting and adding local entries whenever changing the mac address of the bridge device. If there already exists an entry that has the same address, for example, in case that br_fdb_changeaddr() has already inserted it, br_fdb_change_mac_address() will simply fail to insert it and no duplicated entry will be made, as it was. This approach also needs br_add_if() to call br_fdb_insert() before br_stp_recalculate_bridge_id() so that we don't create an entry whose dst == NULL in this function to preserve previous behavior. Note that this is a slight change in behavior where the bridge device can receive the traffic to the new address before calling br_stp_recalculate_bridge_id() in br_add_if(). However, it is not a problem because we have already the address on the new port and such a way to insert new one before recalculating bridge id is taken in br_device_event() as well. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	a3ebb7efe7	bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address We have been always failed to delete the old entry at br_fdb_change_mac_address() because br_set_mac_address() updates dev->dev_addr before calling br_fdb_change_mac_address() and br_fdb_change_mac_address() uses dev->dev_addr to find the old entry. That update of dev_addr is completely unnecessary because the same work is done in br_stp_change_bridge_id() which is called right away after calling br_fdb_change_mac_address(). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	2836882fe0	bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr Since commit `bc9a25d21e` ("bridge: Add vlan support for local fdb entries"), br_fdb_changeaddr() has inserted a new local fdb entry only if it can find old one. But if we have two ports where they have the same address or user has deleted a local entry, there will be no entry for one of the ports. Example of problematic case: ip link set eth0 address aa:bb:cc:dd:ee:ff ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 brctl addif br0 eth1 # eth1 will not have a local entry due to dup. ip link set eth1 address 12:34:56:78:90:ab Then, the new entry for the address 12:34:56:78:90:ab will not be created, and the bridge device will not be able to communicate. Insert new entries regardless of whether we can find old entries or not. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Toshiaki Makita	a5642ab474	bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr br_fdb_changeaddr() assumes that there is at most one local entry per port per vlan. It used to be true, but since commit `36fd2b63e3` ("bridge: allow creating/deleting fdb entries via netlink"), it has not been so. Therefore, the function might fail to search a correct previous address to be deleted and delete an arbitrary local entry if user has added local entries manually. Example of problematic case: ip link set eth0 address ee:ff:12:34:56:78 brctl addif br0 eth0 bridge fdb add 12:34:56:78:90:ab dev eth0 master ip link set eth0 address aa:bb:cc:dd:ee:ff Then, the address 12:34:56:78:90:ab might be deleted instead of ee:ff:12:34:56:78, the original mac address of eth0. Address this issue by introducing a new flag, added_by_user, to struct net_bridge_fdb_entry. Note that br_fdb_delete_by_port() has to set added_by_user to 0 in cases like: ip link set eth0 address 12:34:56:78:90:ab ip link set eth1 address aa:bb:cc:dd:ee:ff brctl addif br0 eth0 bridge fdb add aa:bb:cc:dd:ee:ff dev eth0 master brctl addif br0 eth1 brctl delif br0 eth0 In this case, kernel should delete the user-added entry aa:bb:cc:dd:ee:ff, but it also should have been added by "brctl addif br0 eth1" originally, so we don't delete it and treat it a new kernel-created entry. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-10 14:34:33 -08:00
Cong Wang	dbe173079a	bridge: fix netconsole setup over bridge Commit `93d8bf9fb8` ("bridge: cleanup netpoll code") introduced a check in br_netpoll_enable(), but this check is incorrect for br_netpoll_setup(). This patch moves the code after the check into __br_netpoll_enable() and calls it in br_netpoll_setup(). For br_add_if(), the check is still needed. Fixes: `93d8bf9fb8` ("bridge: cleanup netpoll code") Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Tested-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-02-06 21:28:06 -08:00
Toshiaki Makita	bdf4351bbc	bridge: Remove unnecessary vlan_put_tag in br_handle_vlan br_handle_vlan() pushes HW accelerated vlan tag into skbuff when outgoing port is the bridge device. This is unnecessary because __netif_receive_skb_core() can handle skbs with HW accelerated vlan tag. In current implementation, __netif_receive_skb_core() needs to extract the vlan tag embedded in skb data. This could cause low network performance especially when receiving frames at a high frame rate on the bridge device. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:29:27 -08:00
WANG Cong	b86f81cca9	bridge: move br_net_exit() to br.c And it can become static. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-13 23:42:39 -08:00
Patrick McHardy	3876d22dba	netfilter: nf_tables: rename nft_do_chain_pktinfo() to nft_do_chain() We don't encode argument types into function names and since besides nft_do_chain() there are only AF-specific versions, there is no risk of confusion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:16 +01:00
Patrick McHardy	fa2c1de0bb	netfilter: nf_tables: minor nf_chain_type cleanups Minor nf_chain_type cleanups: - reorder struct to plug a hoe - rename struct module member to "owner" for consistency - rename nf_hookfn array to "hooks" for consistency - reorder initializers for better readability Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:15 +01:00
Patrick McHardy	2a37d755b8	netfilter: nf_tables: constify chain type definitions and pointers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:15 +01:00
Patrick McHardy	88ce65a71c	netfilter: nf_tables: add missing module references to chain types In some cases we neither take a reference to the AF info nor to the chain type, allowing the module to be unloaded while in use. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-09 20:17:14 +01:00
Patrick McHardy	115a60b173	netfilter: nf_tables: add support for multi family tables Add support to register chains to multiple hooks for different address families for mixed IPv4/IPv6 tables. Signed-off-by: Patrick McHardy <kaber@trash.net>	2014-01-07 23:55:46 +01:00
Patrick McHardy	3b088c4bc0	netfilter: nf_tables: make chain types override the default AF functions Currently the AF-specific hook functions override the chain-type specific hook functions. That doesn't make too much sense since the chain types are a special case of the AF-specific hooks. Make the AF-specific hook functions the default and make the optional chain type hooks override them. As a side effect, the necessary code restructuring reduces the code size, f.i. in case of nf_tables_ipv4.o: nf_tables_ipv4_init_net \| -24 nft_do_chain_ipv4 \| -113 2 functions changed, 137 bytes removed, diff: -137 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2014-01-07 23:50:43 +01:00
David S. Miller	56a4342dfe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 17:37:45 -05:00
sfeldma@cumulusnetworks.com	fbf2671bb8	bridge: use DEVICE_ATTR_xx macros Use DEVICE_ATTR_RO/RW macros to simplify bridge sysfs attribute definitions. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:40:46 -05:00
Curt Brune	fe0d692bbc	bridge: use spin_lock_bh() in br_multicast_set_hash_max br_multicast_set_hash_max() is called from process context in net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function. br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock), which can deadlock the CPU if a softirq that also tries to take the same lock interrupts br_multicast_set_hash_max() while the lock is held . This can happen quite easily when any of the bridge multicast timers expire, which try to take the same lock. The fix here is to use spin_lock_bh(), preventing other softirqs from executing on this CPU. Steps to reproduce: 1. Create a bridge with several interfaces (I used 4). 2. Set the "multicast query interval" to a low number, like 2. 3. Enable the bridge as a multicast querier. 4. Repeatedly set the bridge hash_max parameter via sysfs. # brctl addbr br0 # brctl addif br0 eth1 eth2 eth3 eth4 # brctl setmcqi br0 2 # brctl setmcquerier br0 1 # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done Signed-off-by: Curt Brune <curt@cumulusnetworks.com> Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-06 16:39:47 -05:00
Li RongQing	8f84985fec	net: unify the pcpu_tstats and br_cpu_netstats as one They are same, so unify them as one, pcpu_sw_netstats. Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats from if_tunnel and remove br_cpu_netstats from br_private.h Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-04 20:10:24 -05:00
stephen hemminger	3678a9d863	netlink: cleanup rntl_af_register The function __rtnl_af_register is never called outside this code, and the return value is always 0. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-01 23:42:19 -05:00
tanxiaojun	97ad8b53e6	bridge: change the position of '{' to the pre line That open brace { should be on the previous line. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-19 19:27:26 -05:00
tanxiaojun	56b148eb94	bridge: change "foo* bar" to "foo bar" "foo bar" should be "foo *bar". Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-19 19:27:26 -05:00
tanxiaojun	31a5b837c2	bridge: add space before '(/{', after ',', etc. Spaces required before the open parenthesis '(', before the open brace '{', after that ',' and around that '?/:'. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-19 19:27:26 -05:00
tanxiaojun	a97bfc1d1f	bridge: remove unnecessary parentheses Return is not a function, parentheses are not required. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-19 19:27:26 -05:00
tanxiaojun	87e823b3d5	bridge: remove unnecessary condition judgment Because err is always negative, remove unnecessary condition judgment. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-19 19:27:25 -05:00
tanxiaojun	1a81a2e0db	bridge: spelling fixes Fix spelling errors in bridge driver. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-18 17:53:22 -05:00
stephen hemminger	8e3bff96af	net: more spelling fixes Various spelling fixes in networking stack Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-10 21:57:11 -05:00
David S. Miller	34f9f43710	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge 'net' into 'net-next' to get the AF_PACKET bug fix that Daniel's direct transmit changes depend upon. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-09 20:20:14 -05:00
Jiri Pirko	859828c0ea	br: fix use of ->rx_handler_data in code executed on non-rx_handler path br_stp_rcv() is reached by non-rx_handler path. That means there is no guarantee that dev is bridge port and therefore simple NULL check of ->rx_handler_data is not enough. There is need to check if dev is really bridge port and since only rcu read lock is held here, do it by checking ->rx_handler pointer. Note that synchronize_net() in netdev_rx_handler_unregister() ensures this approach as valid. Introduced originally by: commit `f350a0a873` "bridge: use rx_handler_data pointer to store net_bridge_port pointer" Fixed but not in the best way by: commit `b5ed54e94d` "bridge: fix RCU races with bridge port" Reintroduced by: commit `716ec052d2` "bridge: fix NULL pointer deref of br_port_get_rcu" Please apply to stable trees as well. Thanks. RH bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1025770 Reported-by: Laine Stump <laine@redhat.com> Debugged-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-06 15:41:40 -05:00
Jeff Kirsher	e664eabd18	netfilter: Fix FSF address in file headers Several files refer to an old address for the Free Software Foundation in the file header comment. Resolve by replacing the address with the URL <http://www.gnu.org/licenses/> so that we do not have to keep updating the header comments anytime the address changes. CC: netfilter@vger.kernel.org CC: Pablo Neira Ayuso <pablo@netfilter.org> CC: Patrick McHardy <kaber@trash.net> CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-12-06 12:37:57 -05:00
David S. Miller	cd2cc01b67	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains fixes for your net tree, they are: * Remove extra quote from connlimit configuration in Kconfig, from Randy Dunlap. * Fix missing mss option in syn packets sent to the backend in our new synproxy target, from Martin Topholm. * Use window scale announced by client when sending the forged syn to the backend, from Martin Topholm. * Fix IPv6 address comparison in ebtables, from Luís Fernando Cornachioni Estrozi. * Fix wrong endianess in sequence adjustment which breaks helpers in NAT configurations, from Phil Oester. * Fix the error path handling of nft_compat, from me. * Make sure the global conntrack counter is decremented after the object has been released, also from me. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-21 12:44:15 -05:00
Ding Tianhong	f873042093	bridge: flush br's address entry in fdb when remove the bridge dev When the following commands are executed: brctl addbr br0 ifconfig br0 hw ether <addr> rmmod bridge The calltrace will occur: [ 563.312114] device eth1 left promiscuous mode [ 563.312188] br0: port 1(eth1) entered disabled state [ 563.468190] kmem_cache_destroy bridge_fdb_cache: Slab cache still has objects [ 563.468197] CPU: 6 PID: 6982 Comm: rmmod Tainted: G O 3.12.0-0.7-default+ #9 [ 563.468199] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 563.468200] 0000000000000880 ffff88010f111e98 ffffffff814d1c92 ffff88010f111eb8 [ 563.468204] ffffffff81148efd ffff88010f111eb8 0000000000000000 ffff88010f111ec8 [ 563.468206] ffffffffa062a270 ffff88010f111ed8 ffffffffa063ac76 ffff88010f111f78 [ 563.468209] Call Trace: [ 563.468218] [<ffffffff814d1c92>] dump_stack+0x6a/0x78 [ 563.468234] [<ffffffff81148efd>] kmem_cache_destroy+0xfd/0x100 [ 563.468242] [<ffffffffa062a270>] br_fdb_fini+0x10/0x20 [bridge] [ 563.468247] [<ffffffffa063ac76>] br_deinit+0x4e/0x50 [bridge] [ 563.468254] [<ffffffff810c7dc9>] SyS_delete_module+0x199/0x2b0 [ 563.468259] [<ffffffff814e0922>] system_call_fastpath+0x16/0x1b [ 570.377958] Bridge firewalling registered --------------------------- cut here ------------------------------- The reason is that when the bridge dev's address is changed, the br_fdb_change_mac_address() will add new address in fdb, but when the bridge was removed, the address entry in the fdb did not free, the bridge_fdb_cache still has objects when destroy the cache, Fix this by flushing the bridge address entry when removing the bridge. v2: according to the Toshiaki Makita and Vlad's suggestion, I only delete the vlan0 entry, it still have a leak here if the vlan id is other number, so I need to call fdb_delete_by_port(br, NULL, 1) to flush all entries whose dst is NULL for the bridge. Suggested-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-20 15:31:11 -05:00
Linus Torvalds	1ee2dcc224	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Mostly these are fixes for fallout due to merge window changes, as well as cures for problems that have been with us for a much longer period of time" 1) Johannes Berg noticed two major deficiencies in our genetlink registration. Some genetlink protocols we passing in constant counts for their ops array rather than something like ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were using fixed IDs for their multicast groups. We have to retain these fixed IDs to keep existing userland tools working, but reserve them so that other multicast groups used by other protocols can not possibly conflict. In dealing with these two problems, we actually now use less state management for genetlink operations and multicast groups. 2) When configuring interface hardware timestamping, fix several drivers that simply do not validate that the hwtstamp_config value is one the driver actually supports. From Ben Hutchings. 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar. 4) In dev_forward_skb(), set the skb->protocol in the right order relative to skb_scrub_packet(). From Alexei Starovoitov. 5) Bridge erroneously fails to use the proper wrapper functions to make calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki Makita. 6) When detaching a bridge port, make sure to flush all VLAN IDs to prevent them from leaking, also from Toshiaki Makita. 7) Put in a compromise for TCP Small Queues so that deep queued devices that delay TX reclaim non-trivially don't have such a performance decrease. One particularly problematic area is 802.11 AMPDU in wireless. From Eric Dumazet. 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts here. Fix from Eric Dumzaet, reported by Dave Jones. 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn. 10) When computing mergeable buffer sizes, virtio-net fails to take the virtio-net header into account. From Michael Dalton. 11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic bumping, this one has been with us for a while. From Eric Dumazet. 12) Fix NULL deref in the new TIPC fragmentation handling, from Erik Hugne. 13) 6lowpan bit used for traffic classification was wrong, from Jukka Rissanen. 14) macvlan has the same issue as normal vlans did wrt. propagating LRO disabling down to the real device, fix it the same way. From Michal Kubecek. 15) CPSW driver needs to soft reset all slaves during suspend, from Daniel Mack. 16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet. 17) The xen-netfront RX buffer refill timer isn't properly scheduled on partial RX allocation success, from Ma JieYue. 18) When ipv6 ping protocol support was added, the AF_INET6 protocol initialization cleanup path on failure was borked a little. Fix from Vlad Yasevich. 19) If a socket disconnects during a read/recvmsg/recvfrom/etc that blocks we can do the wrong thing with the msg_name we write back to userspace. From Hannes Frederic Sowa. There is another fix in the works from Hannes which will prevent future problems of this nature. 20) Fix route leak in VTI tunnel transmit, from Fan Du. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) genetlink: make multicast groups const, prevent abuse genetlink: pass family to functions using groups genetlink: add and use genl_set_err() genetlink: remove family pointer from genl_multicast_group genetlink: remove genl_unregister_mc_group() hsr: don't call genl_unregister_mc_group() quota/genetlink: use proper genetlink multicast APIs drop_monitor/genetlink: use proper genetlink multicast APIs genetlink: only pass array to genl_register_family_with_ops() tcp: don't update snd_nxt, when a socket is switched from repair mode atm: idt77252: fix dev refcnt leak xfrm: Release dst if this dst is improper for vti tunnel netlink: fix documentation typo in netlink_set_err() be2net: Delete secondary unicast MAC addresses during be_close be2net: Fix unconditional enabling of Rx interface options net, virtio_net: replace the magic value ping: prevent NULL pointer dereference on write to msg_name bnx2x: Prevent "timeout waiting for state X" bnx2x: prevent CFC attention bnx2x: Prevent panic during DMAE timeout ...	2013-11-19 15:50:47 -08:00
Luís Fernando Cornachioni Estrozi	acab78b996	netfilter: ebt_ip6: fix source and destination matching This bug was introduced on commit `0898f99a2`. This just recovers two checks that existed before as suggested by Bart De Schuymer. Signed-off-by: Luís Fernando Cornachioni Estrozi <lestrozi@uolinc.com> Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-11-19 15:33:29 +01:00
Toshiaki Makita	b4e09b29c7	bridge: Fix memory leak when deleting bridge with vlan filtering enabled We currently don't call br_vlan_flush() when deleting a bridge, which leads to memory leak if br->vlan_info is allocated. Steps to reproduce: while : do brctl addbr br0 bridge vlan add dev br0 vid 10 self brctl delbr br0 done We can observe the cache size of corresponding slab entry (as kmalloc-2048 in SLUB) is increased. kmemleak output: unreferenced object 0xffff8800b68a7000 (size 2048): comm "bridge", pid 2086, jiffies 4295774704 (age 47.656s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 48 9b 36 00 88 ff ff .........H.6.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815eb6ae>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8116a1ca>] kmem_cache_alloc_trace+0xca/0x220 [<ffffffffa03eddd6>] br_vlan_add+0x66/0xe0 [bridge] [<ffffffffa03e543c>] br_setlink+0x2dc/0x340 [bridge] [<ffffffff8150e481>] rtnl_bridge_setlink+0x101/0x200 [<ffffffff8150d9d9>] rtnetlink_rcv_msg+0x99/0x260 [<ffffffff81528679>] netlink_rcv_skb+0xa9/0xc0 [<ffffffff8150d938>] rtnetlink_rcv+0x28/0x30 [<ffffffff81527ccd>] netlink_unicast+0xdd/0x190 [<ffffffff8152807f>] netlink_sendmsg+0x2ff/0x740 [<ffffffff814e8368>] sock_sendmsg+0x88/0xc0 [<ffffffff814e8ac8>] ___sys_sendmsg.part.14+0x298/0x2b0 [<ffffffff814e91de>] __sys_sendmsg+0x4e/0x90 [<ffffffff814e922e>] SyS_sendmsg+0xe/0x10 [<ffffffff81601669>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-14 16:16:34 -05:00
Toshiaki Makita	dbbaf949bc	bridge: Call vlan_vid_del for all vids at nbp_vlan_flush We should call vlan_vid_del for all vids at nbp_vlan_flush to prevent vid_info->refcount from being leaked when detaching a bridge port. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-14 16:16:34 -05:00
Toshiaki Makita	192368372d	bridge: Use vlan_vid_[add/del] instead of direct ndo_vlan_rx_[add/kill]_vid calls We should use wrapper functions vlan_vid_[add/del] instead of ndo_vlan_rx_[add/kill]_vid. Otherwise, we might be not able to communicate using vlan interface in a certain situation. Example of problematic case: vconfig add eth0 10 brctl addif br0 eth0 bridge vlan add dev eth0 vid 10 bridge vlan del dev eth0 vid 10 brctl delif br0 eth0 In this case, we cannot communicate via eth0.10 because vlan 10 is filtered by NIC that has the vlan filtering feature. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-14 16:16:34 -05:00
Linus Torvalds	5e30025a31	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest changes: - add lockdep support for seqcount/seqlocks structures, this unearthed both bugs and required extra annotation. - move the various kernel locking primitives to the new kernel/locking/ directory" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) block: Use u64_stats_init() to initialize seqcounts locking/lockdep: Mark __lockdep_count_forward_deps() as static lockdep/proc: Fix lock-time avg computation locking/doc: Update references to kernel/mutex.c ipv6: Fix possible ipv6 seqlock deadlock cpuset: Fix potential deadlock w/ set_mems_allowed seqcount: Add lockdep functionality to seqcount/seqlock structures net: Explicitly initialize u64_stats_sync structures for lockdep locking: Move the percpu-rwsem code to kernel/locking/ locking: Move the lglocks code to kernel/locking/ locking: Move the rwsem code to kernel/locking/ locking: Move the rtmutex code to kernel/locking/ locking: Move the semaphore core to kernel/locking/ locking: Move the spinlock code to kernel/locking/ locking: Move the lockdep code to kernel/locking/ locking: Move the mutex code to kernel/locking/ hung_task debugging: Add tracepoint to report the hang x86/locking/kconfig: Update paravirt spinlock Kconfig description lockstat: Report avg wait and hold times lockdep, x86/alternatives: Drop ancient lockdep fixup message ...	2013-11-14 16:30:30 +09:00
John Stultz	827da44c61	net: Explicitly initialize u64_stats_sync structures for lockdep In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-11-06 12:40:25 +01:00
David S. Miller	f8785c5514	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== This batch contains fives nf_tables patches for your net-next tree, they are: * Fix possible use after free in the module removal path of the x_tables compatibility layer, from Dan Carpenter. * Add filter chain type for the bridge family, from myself. * Fix Kconfig dependencies of the nf_tables bridge family with the core, from myself. * Fix sparse warnings in nft_nat, from Tomasz Bursztyka. * Remove duplicated include in the IPv4 family support for nf_tables, from Wei Yongjun. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 19:48:57 -05:00
David S. Miller	72c39a0ade	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== This is another batch containing Netfilter/IPVS updates for your net-next tree, they are: * Six patches to make the ipt_CLUSTERIP target support netnamespace, from Gao feng. * Two cleanups for the nf_conntrack_acct infrastructure, introducing a new structure to encapsulate conntrack counters, from Holger Eitzenberger. * Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann. * Skip checksum recalculation in SCTP support for IPVS, also from Daniel Borkmann. * Fix behavioural change in xt_socket after IP early demux, from Florian Westphal. * Fix bogus large memory allocation in the bitmap port set type in ipset, from Jozsef Kadlecsik. * Fix possible compilation issues in the hash netnet set type in ipset, also from Jozsef Kadlecsik. * Define constants to identify netlink callback data in ipset dumps, again from Jozsef Kadlecsik. * Use sock_gen_put() in xt_socket to replace xt_socket_put_sk, from Eric Dumazet. * Improvements for the SH scheduler in IPVS, from Alexander Frolkin. * Remove extra delay due to unneeded rcu barrier in IPVS net namespace cleanup path, from Julian Anastasov. * Save some cycles in ip6t_REJECT by skipping checksum validation in packets leaving from our stack, from Stanislav Fomichev. * Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 19:46:58 -05:00
David S. Miller	394efd19d5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-11-04 13:48:30 -05:00
Vlad Yasevich	06499098a0	bridge: pass correct vlan id to multicast code Currently multicast code attempts to extrace the vlan id from the skb even when vlan filtering is disabled. This can lead to mdb entries being created with the wrong vlan id. Pass the already extracted vlan id to the multicast filtering code to make the correct id is used in creation as well as lookup. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 17:40:08 -04:00
Pablo Neira Ayuso	46413825a7	netfilter: bridge: nf_tables: add filter chain type This patch adds the filter chain type which is required to create filter chains in the bridge family from userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-28 18:07:35 +01:00
Pablo Neira Ayuso	6e078bc2f2	netfilter: bridge: fix nf_tables bridge dependencies with main core when CONFIG_NF_TABLES[_MODULE] is not enabled, but CONFIG_NF_TABLES_BRIDGE is enabled: net/bridge/netfilter/nf_tables_bridge.c: In function 'nf_tables_bridge_init_net': net/bridge/netfilter/nf_tables_bridge.c:24:5: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:25:9: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:28:2: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:30:34: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:35:11: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c: In function 'nf_tables_bridge_exit_net': net/bridge/netfilter/nf_tables_bridge.c:41:27: error: 'struct net' has no member named 'nft' net/bridge/netfilter/nf_tables_bridge.c:42:11: error: 'struct net' has no member named 'nft' Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-28 17:41:21 +01:00
Florian Westphal	6b8dbcf2c4	bridge: netfilter: orphan skb before invoking ip netfilter hooks Pekka Pietikäinen reports xt_socket behavioural change after commit 00028aa37098o (netfilter: xt_socket: use IP early demux). Reason is xt_socket now no longer does an unconditional sk lookup - it re-uses existing skb->sk if possible, assuming ->sk was set by ip early demux. However, when netfilter is invoked via bridge, this can cause 'bogus' sockets to be examined by the match, e.g. a 'tun' device socket. bridge netfilter should orphan the skb just like the routing path before invoking ipv4/ipv6 netfilter hooks to avoid this. Reported-and-tested-by: Pekka Pietikäinen <pp@ee.oulu.fi> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-27 21:44:33 +01:00
David S. Miller	afb14c7cb6	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== The following patchset contains three netfilter fixes for your net tree, they are: * A couple of fixes to resolve info leak to userspace due to uninitialized memory area in ulogd, from Mathias Krause. * Fix instruction ordering issues that may lead to the access of uninitialized data in x_tables. The problem involves the table update (producer) and the main packet matching (consumer) routines. Detected in SMP ARMv7, from Will Deacon. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-23 16:55:04 -04:00
David S. Miller	c3fa32b976	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-23 16:49:34 -04:00
Linus Lüssing	454594f3b9	Revert "bridge: only expire the mdb entry when query is received" While this commit was a good attempt to fix issues occuring when no multicast querier is present, this commit still has two more issues: 1) There are cases where mdb entries do not expire even if there is a querier present. The bridge will unnecessarily continue flooding multicast packets on the according ports. 2) Never removing an mdb entry could be exploited for a Denial of Service by an attacker on the local link, slowly, but steadily eating up all memory. Actually, this commit became obsolete with "bridge: disable snooping if there is no querier" (`b00589af3b`) which included fixes for a few more cases. Therefore reverting the following commits (the commit stated in the commit message plus three of its follow up fixes): ==================== Revert "bridge: update mdb expiration timer upon reports." This reverts commit `f144febd93`. Revert "bridge: do not call setup_timer() multiple times" This reverts commit `1faabf2aab`. Revert "bridge: fix some kernel warning in multicast timer" This reverts commit `c7e8e8a8f7`. Revert "bridge: only expire the mdb entry when query is received" This reverts commit `9f00b2e7cf`. ==================== CC: Cong Wang <amwang@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-22 14:41:02 -04:00
Joe Perches	348662a142	net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypes There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-19 19:12:11 -04:00
Toshiaki Makita	dfb5fa32c6	bridge: Fix updating FDB entries when the PVID is applied We currently set the value that variable vid is pointing, which will be used in FDB later, to 0 at br_allowed_ingress() when we receive untagged or priority-tagged frames, even though the PVID is valid. This leads to FDB updates in such a wrong way that they are learned with VID 0. Update the value to that of PVID if the PVID is applied. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-18 16:02:53 -04:00
Toshiaki Makita	d1c6c708c4	bridge: Fix the way the PVID is referenced We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is set or not at br_get_pvid(), while we don't care about the bit in adding/deleting the PVID, which makes it impossible to forward any incomming untagged frame with vlan_filtering enabled. Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate that the PVID is not set, which is slightly more efficient than using the VLAN_TAG_PRESENT. Fix the problem by getting rid of using the VLAN_TAG_PRESENT. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-18 16:02:53 -04:00
Toshiaki Makita	b90356ce17	bridge: Apply the PVID to priority-tagged frames IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames use the PVID for the port as its VID. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Apply the PVID to not only untagged frames but also priority-tagged frames. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-18 16:02:53 -04:00
Toshiaki Makita	8adff41c3d	bridge: Don't use VID 0 and 4095 in vlan filtering IEEE 802.1Q says that: - VID 0 shall not be configured as a PVID, or configured in any Filtering Database entry. - VID 4095 shall not be configured as a PVID, or transmitted in a tag header. This VID value may be used to indicate a wildcard match for the VID in management operations or Filtering Database entries. (See IEEE 802.1Q-2011 6.9.1 and Table 9-2) Don't accept adding these VIDs in the vlan_filtering implementation. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-18 16:02:52 -04:00
Vlad Yasevich	4b6c7879d8	bridge: Correctly clamp MAX forward_delay when enabling STP Commit `be4f154d5e` bridge: Clamp forward_delay when enabling STP had a typo when attempting to clamp maximum forward delay. It is possible to set bridge_forward_delay to be higher then permitted maximum when STP is off. When turning STP on, the higher then allowed delay has to be clamed down to max value. CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Reviewed-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-17 16:12:15 -04:00
Pablo Neira Ayuso	99633ab29b	netfilter: nf_tables: complete net namespace support Register family per netnamespace to ensure that sets are only visible in its approapriate namespace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 18:00:59 +02:00
Patrick McHardy	96518518cc	netfilter: add nftables This patch adds nftables which is the intended successor of iptables. This packet filtering framework reuses the existing netfilter hooks, the connection tracking system, the NAT subsystem, the transparent proxying engine, the logging infrastructure and the userspace packet queueing facilities. In a nutshell, nftables provides a pseudo-state machine with 4 general purpose registers of 128 bits and 1 specific purpose register to store verdicts. This pseudo-machine comes with an extensible instruction set, a.k.a. "expressions" in the nftables jargon. The expressions included in this patch provide the basic functionality, they are: * bitwise: to perform bitwise operations. * byteorder: to change from host/network endianess. * cmp: to compare data with the content of the registers. * counter: to enable counters on rules. * ct: to store conntrack keys into register. * exthdr: to match IPv6 extension headers. * immediate: to load data into registers. * limit: to limit matching based on packet rate. * log: to log packets. * meta: to match metainformation that usually comes with the skbuff. * nat: to perform Network Address Translation. * payload: to fetch data from the packet payload and store it into registers. * reject (IPv4 only): to explicitly close connection, eg. TCP RST. Using this instruction-set, the userspace utility 'nft' can transform the rules expressed in human-readable text representation (using a new syntax, inspired by tcpdump) to nftables bytecode. nftables also inherits the table, chain and rule objects from iptables, but in a more configurable way, and it also includes the original datatype-agnostic set infrastructure with mapping support. This set infrastructure is enhanced in the follow up patch (netfilter: nf_tables: add netlink set API). This patch includes the following components: * the netlink API: net/netfilter/nf_tables_api.c and include/uapi/netfilter/nf_tables.h * the packet filter core: net/netfilter/nf_tables_core.c * the expressions (described above): net/netfilter/nft_.c the filter tables: arp, IPv4, IPv6 and bridge: net/ipv4/netfilter/nf_tables_ipv4.c net/ipv6/netfilter/nf_tables_ipv6.c net/ipv4/netfilter/nf_tables_arp.c net/bridge/netfilter/nf_tables_bridge.c * the NAT table (IPv4 only): net/ipv4/netfilter/nf_table_nat_ipv4.c * the route table (similar to mangle): net/ipv4/netfilter/nf_table_route_ipv4.c net/ipv6/netfilter/nf_table_route_ipv6.c * internal definitions under: include/net/netfilter/nf_tables.h include/net/netfilter/nf_tables_core.h * It also includes an skeleton expression: net/netfilter/nft_expr_template.c and the preliminary implementation of the meta target net/netfilter/nft_meta_target.c It also includes a change in struct nf_hook_ops to add a new pointer to store private data to the hook, that is used to store the rule list per chain. This patch is based on the patch from Patrick McHardy, plus merged accumulated cleanups, fixes and small enhancements to the nftables code that has been done since 2009, which are: From Patrick McHardy: * nf_tables: adjust netlink handler function signatures * nf_tables: only retry table lookup after successful table module load * nf_tables: fix event notification echo and avoid unnecessary messages * nft_ct: add l3proto support * nf_tables: pass expression context to nft_validate_data_load() * nf_tables: remove redundant definition * nft_ct: fix maxattr initialization * nf_tables: fix invalid event type in nf_tables_getrule() * nf_tables: simplify nft_data_init() usage * nf_tables: build in more core modules * nf_tables: fix double lookup expression unregistation * nf_tables: move expression initialization to nf_tables_core.c * nf_tables: build in payload module * nf_tables: use NFPROTO constants * nf_tables: rename pid variables to portid * nf_tables: save 48 bits per rule * nf_tables: introduce chain rename * nf_tables: check for duplicate names on chain rename * nf_tables: remove ability to specify handles for new rules * nf_tables: return error for rule change request * nf_tables: return error for NLM_F_REPLACE without rule handle * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification * nf_tables: fix NLM_F_MULTI usage in netlink notifications * nf_tables: include NLM_F_APPEND in rule dumps From Pablo Neira Ayuso: * nf_tables: fix stack overflow in nf_tables_newrule * nf_tables: nft_ct: fix compilation warning * nf_tables: nft_ct: fix crash with invalid packets * nft_log: group and qthreshold are 2^16 * nf_tables: nft_meta: fix socket uid,gid handling * nft_counter: allow to restore counters * nf_tables: fix module autoload * nf_tables: allow to remove all rules placed in one chain * nf_tables: use 64-bits rule handle instead of 16-bits * nf_tables: fix chain after rule deletion * nf_tables: improve deletion performance * nf_tables: add missing code in route chain type * nf_tables: rise maximum number of expressions from 12 to 128 * nf_tables: don't delete table if in use * nf_tables: fix basechain release From Tomasz Bursztyka: * nf_tables: Add support for changing users chain's name * nf_tables: Change chain's name to be fixed sized * nf_tables: Add support for replacing a rule by another one * nf_tables: Update uapi nftables netlink header documentation From Florian Westphal: * nft_log: group is u16, snaplen u32 From Phil Oester: * nf_tables: operational limit match Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 17:15:48 +02:00
Patrick McHardy	795aa6ef6a	netfilter: pass hook ops to hookfn Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-14 11:29:31 +02:00
Vlad Yasevich	f144febd93	bridge: update mdb expiration timer upon reports. commit `9f00b2e7cf` bridge: only expire the mdb entry when query is received changed the mdb expiration timer to be armed only when QUERY is received. Howerver, this causes issues in an environment where the multicast server socket comes and goes very fast while a client is trying to send traffic to it. The root cause is a race where a sequence of LEAVE followed by REPORT messages can race against QUERY messages generated in response to LEAVE. The QUERY ends up starting the expiration timer, and that timer can potentially expire after the new REPORT message has been received signaling the new join operation. This leads to a significant drop in multicast traffic and possible complete stall. The solution is to have REPORT messages update the expiration timer on entries that already exist. CC: Cong Wang <xiyou.wangcong@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-11 00:56:47 -04:00
Joe Perches	d458cdf712	net:drivers/net: Miscellaneous conversions to ETH_ALEN Convert the memset/memcpy uses of 6 to ETH_ALEN where appropriate. Also convert some struct definitions and u8 array declarations of [6] to ETH_ALEN. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-02 17:04:45 -04:00
Mathias Krause	ca0a10672d	netfilter: ebt_ulog: fix info leaks The ulog messages leak heap bytes by the means of padding bytes and incompletely filled string arrays. Fix those by memset(0)'ing the whole struct before filling it. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-02 17:28:20 +02:00
Hong Zhiguo	716ec052d2	bridge: fix NULL pointer deref of br_port_get_rcu The NULL deref happens when br_handle_frame is called between these 2 lines of del_nbp: dev->priv_flags &= ~IFF_BRIDGE_PORT; /* --> br_handle_frame is called at this time */ netdev_rx_handler_unregister(dev); In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced without check but br_port_get_rcu(dev) returns NULL if: !(dev->priv_flags & IFF_BRIDGE_PORT) Eric Dumazet pointed out the testing of IFF_BRIDGE_PORT is not necessary here since we're in rcu_read_lock and we have synchronize_net() in netdev_rx_handler_unregister. So remove the testing of IFF_BRIDGE_PORT and by the previous patch, make sure br_port_get_rcu is called in bridging code. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-15 22:03:33 -04:00
Hong Zhiguo	1fb1754a8c	bridge: use br_port_get_rtnl within rtnl lock current br_port_get_rcu is problematic in bridging path (NULL deref). Change these calls in netlink path first. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-15 22:03:33 -04:00
Herbert Xu	be4f154d5e	bridge: Clamp forward_delay when enabling STP At some point limits were added to forward_delay. However, the limits are only enforced when STP is enabled. This created a scenario where you could have a value outside the allowed range while STP is disabled, which then stuck around even after STP is enabled. This patch fixes this by clamping the value when we enable STP. I had to move the locking around a bit to ensure that there is no window where someone could insert a value outside the range while we're in the middle of enabling STP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-12 23:32:14 -04:00
Chris Healy	9a0620133c	resubmit bridge: fix message_age_timer calculation This changes the message_age_timer calculation to use the BPDU's max age as opposed to the local bridge's max age. This is in accordance with section 8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification. With the current implementation, when running with very large bridge diameters, convergance will not always occur even if a root bridge is configured to have a longer max age. Tested successfully on bridge diameters of ~200. Signed-off-by: Chris Healy <cphealy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-12 23:30:37 -04:00
Linus Torvalds	cc998ff881	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David Miller: "Noteworthy changes this time around: 1) Multicast rejoin support for team driver, from Jiri Pirko. 2) Centralize and simplify TCP RTT measurement handling in order to reduce the impact of bad RTO seeding from SYN/ACKs. Also, when both timestamps and local RTT measurements are available prefer the later because there are broken middleware devices which scramble the timestamp. From Yuchung Cheng. 3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel memory consumed to queue up unsend user data. From Eric Dumazet. 4) Add a "physical port ID" abstraction for network devices, from Jiri Pirko. 5) Add a "suppress" operation to influence fib_rules lookups, from Stefan Tomanek. 6) Add a networking development FAQ, from Paul Gortmaker. 7) Extend the information provided by tcp_probe and add ipv6 support, from Daniel Borkmann. 8) Use RCU locking more extensively in openvswitch data paths, from Pravin B Shelar. 9) Add SCTP support to openvswitch, from Joe Stringer. 10) Add EF10 chip support to SFC driver, from Ben Hutchings. 11) Add new SYNPROXY netfilter target, from Patrick McHardy. 12) Compute a rate approximation for sending in TCP sockets, and use this to more intelligently coalesce TSO frames. Furthermore, add a new packet scheduler which takes advantage of this estimate when available. From Eric Dumazet. 13) Allow AF_PACKET fanouts with random selection, from Daniel Borkmann. 14) Add ipv6 support to vxlan driver, from Cong Wang" Resolved conflicts as per discussion. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits) openvswitch: Fix alignment of struct sw_flow_key. netfilter: Fix build errors with xt_socket.c tcp: Add missing braces to do_tcp_setsockopt caif: Add missing braces to multiline if in cfctrl_linkup_request bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize vxlan: Fix kernel panic on device delete. net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls net: mvneta: properly disable HW PHY polling and ensure adjust_link() works icplus: Use netif_running to determine device state ethernet/arc/arc_emac: Fix huge delays in large file copies tuntap: orphan frags before trying to set tx timestamp tuntap: purge socket error queue on detach qlcnic: use standard NAPI weights ipv6:introduce function to find route for redirect bnx2x: VF RSS support - VF side bnx2x: VF RSS support - PF side vxlan: Notify drivers for listening UDP port changes net: usbnet: update addr_assign_type if appropriate driver/net: enic: update enic maintainers and driver driver/net: enic: Exposing symbols for Cisco's low latency driver ...	2013-09-05 14:54:29 -07:00
David S. Miller	06c54055be	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c net/bridge/br_multicast.c net/ipv6/sit.c The conflicts were minor: 1) sit.c changes overlap with change to ip_tunnel_xmit() signature. 2) br_multicast.c had an overlap between computing max_delay using msecs_to_jiffies and turning MLDV2_MRC() into an inline function with a name using lowercase instead of uppercase letters. 3) stmmac had two overlapping changes, one which conditionally allocated and hooked up a dma_cfg based upon the presence of the pbl OF property, and another one handling store-and-forward DMA made. The latter of which should not go into the new of_find_property() basic block. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 14:58:52 -04:00
Linus Lüssing	3c3769e633	bridge: apply multicast snooping to IPv6 link-local, too The multicast snooping code should have matured enough to be safely applicable to IPv6 link-local multicast addresses (excluding the link-local all nodes address, ff02::1), too. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 12:35:53 -04:00
Linus Lüssing	8fad9c39f3	bridge: prevent flooding IPv6 packets that do not have a listener Currently if there is no listener for a certain group then IPv6 packets for that group are flooded on all ports, even though there might be no host and router interested in it on a port. With this commit they are only forwarded to ports with a multicast router. Just like commit `bd4265fe36` ("bridge: Only flood unregistered groups to routers") did for IPv4, let's do the same for IPv6 with the same reasoning. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-05 12:35:41 -04:00
Linus Torvalds	27703bb4a6	PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJSJo+1AAoJENkgDmzRrbjxIC4QALJK95o8AUXuwUkl+2fmFkUt hh2/PJ1vDYgk4Xt0J6hyoK7XMa0H1RkbBrROuDdsBnorMFpEsGcgdkUZte9ufoAS 97Bg+7N0KPbTB/S8vOwtW1vbERTJIVPN2uf6h1Wqm9Xc2puCh3HbMMr1AWMGu0WQ NqY5+Zz8zecy1UOrMhEP6H1CjeQcL1w1DO6YM5ydeqlKNzAz+JMfDXriLPDwiE7+ XFPDF/O3Vtd2ckA7L70Lio7hfHwxV5U4WwFVfiwls98XB4jcZqDKIoh1r8z4SRgR +0Rae2DN3BaOabGMr//5XdrzQVpwJTh5m2w8BAOHJvCJ9HR7Sq29UIN4u+TowZBy L2xYo4dvFxkympwu5zEd3c7vHYWKIaqmSq5PIjr4gF/uIo2OeOTrpPIK782ZEYb7 e+qUgOEM05V9AmQZCrSZeP9u474Sj8ow3sCtWxfdRtwNfoEIcUXsNNJd/zDHlVtW cEtXqc2xXIpcuUJQWlSaGp8fmRQjVZPzrLKYLM2m39ZcOOJbf5rzQAYS7hHPosIa SK+YVux/+Zzi+Xo/vXq1OlM/SruCr5S7JOgCxLowoQ88vupgXME6uPyC8EO+QQ50 GsrHes5ZNLbk0uVsfcexIyojkUnyvDmmnDpv+1zdC6RgZLJQn8OXp5yNhHhnhrFT BiHX6YFWtDDqRlVv8Q0F =LeaW -----END PGP SIGNATURE----- Merge tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull PTR_RET() removal patches from Rusty Russell: "PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle" [ There are still some PTR_RET users scattered about, with some of them possibly being new, but most of them existing in Rusty's tree too. We have that #define PTR_RET(p) PTR_ERR_OR_ZERO(p) thing in <linux/err.h>, so they continue to work for now - Linus ] * tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR(). staging/zcache: don't use PTR_RET(). remoteproc: don't use PTR_RET(). pinctrl: don't use PTR_RET(). acpi: Replace weird use of PTR_RET. s390: Replace weird use of PTR_RET. PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. PTR_RET is now PTR_ERR_OR_ZERO	2013-09-04 17:31:11 -07:00
Daniel Borkmann	e3f5b17047	net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation Get rid of MLDV2_MRC and use our new macros for mantisse and exponent to calculate Maximum Response Delay out of the Maximum Response Code. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-09-04 14:53:20 -04:00
Daniel Borkmann	2d98c29b6f	net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay While looking into MLDv1/v2 code, I noticed that bridging code does not convert it's max delay into jiffies for MLDv2 messages as we do in core IPv6' multicast code. RFC3810, 5.1.3. Maximum Response Code says: The Maximum Response Code field specifies the maximum time allowed before sending a responding Report. The actual time allowed, called the Maximum Response Delay, is represented in units of milliseconds, and is derived from the Maximum Response Code as follows: [...] As we update timers that work with jiffies, we need to convert it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Linus Lüssing <linus.luessing@web.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 17:56:47 -04:00
Linus Lüssing	cc0fdd8028	bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones Currently we would still potentially suffer multicast packet loss if there is just either an IGMP or an MLD querier: For the former case, we would possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is because we are currently assuming that if either an IGMP or MLD querier is present that the other one is present, too. This patch makes the behaviour and fix added in "bridge: disable snooping if there is no querier" (`b00589af3b`) to also work if there is either just an IGMP or an MLD querier on the link: It refines the deactivation of the snooping to be protocol specific by using separate timers for the snooped IGMP and MLD queries as well as separate timers for our internal IGMP and MLD queriers. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-30 15:24:37 -04:00
Florian Fainelli	fd094808a0	bridge: inherit slave devices needed_headroom Some slave devices may have set a dev->needed_headroom value which is different than the default one, most likely in order to prepend a hardware descriptor in front of the Ethernet frame to send. Whenever a new slave is added to a bridge, ensure that we update the needed_headroom value accordingly to account for the slave needed_headroom value. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-29 15:17:09 -04:00
David S. Miller	b05930f5d1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-26 16:37:08 -04:00
Toshiaki Makita	ef40b7ef18	bridge: Use the correct bit length for bitmap functions in the VLAN code The VLAN code needs to know the length of the per-port VLAN bitmap to perform its most basic operations (retrieving VLAN informations, removing VLANs, forwarding database manipulation, etc). Unfortunately, in the current implementation we are using a macro that indicates the bitmap size in longs in places where the size in bits is expected, which in some cases can cause what appear to be random failures. Use the correct macro. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-20 23:35:57 -07:00
David S. Miller	2ff1cf12c9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2013-08-16 15:37:26 -07:00
Wang Sheng-Hui	15401946f9	bridge: correct the comment for file br_sysfs_br.c br_sysfs_if.c is for sysfs attributes of bridge ports, while br_sysfs_br.c is for sysfs attributes of bridge itself. Correct the comment here. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-07 10:35:06 -07:00
Linus Lüssing	248ba8ec05	bridge: don't try to update timers in case of broken MLD queries Currently we are reading an uninitialized value for the max_delay variable when snooping an MLD query message of invalid length and would update our timers with that. Fixing this by simply ignoring such broken MLD queries (just like we do for IGMP already). This is a regression introduced by: "bridge: disable snooping if there is no querier" (`b00589af3b`) Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-05 15:43:39 -07:00
stephen hemminger	762a3d89eb	bridge: fix rcu check warning in multicast port group Use of RCU here with out marked pointer and function doesn't match prototype with sparse. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-04 18:41:52 -07:00
David S. Miller	0e76a3a587	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-08-03 21:36:46 -07:00
Linus Lüssing	b00589af3b	bridge: disable snooping if there is no querier If there is no querier on a link then we won't get periodic reports and therefore won't be able to learn about multicast listeners behind ports, potentially leading to lost multicast packets, especially for multicast listeners that joined before the creation of the bridge. These lost multicast packets can appear since `c5c2326059` ("bridge: Add multicast_querier toggle and disable queries by default") in particular. With this patch we are flooding multicast packets if our querier is disabled and if we didn't detect any other querier. A grace period of the Maximum Response Delay of the querier is added to give multicast responses enough time to arrive and to be learned from before disabling the flooding behaviour again. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-31 17:40:21 -07:00
stephen hemminger	93d8bf9fb8	bridge: cleanup netpoll code This started out with fixing a sparse warning, then I realized that the wrapper function br_netpoll_info could just be collapsed away by rolling it into the enable code. Also, eliminate unnecessary goto's Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-26 15:24:32 -07:00
Jiri Pirko	4aa5dee4d9	net: convert resend IGMP to notifier event Until now, bond_resend_igmp_join_requests() looks for vlans attached to bonding device, bridge where bonding act as port manually. It does not care of other scenarios, like stacked bonds or team device above. Make this more generic and use netdev notifier to propagate the event to upper devices and to actually call ip_mc_rejoin_groups(). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-23 16:52:47 -07:00
Eric Dumazet	1faabf2aab	bridge: do not call setup_timer() multiple times commit `9f00b2e7cf` ("bridge: only expire the mdb entry when query is received") added a nasty bug as an active timer can be reinitialized. setup_timer() must be done once, no matter how many time mod_timer() is called. br_multicast_new_group() is the right place to do this. Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Diagnosed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-19 22:12:42 -07:00
Rusty Russell	8c6ffba0ed	PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. Sweep of the simple cases. Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arm-kernel@lists.infradead.org Cc: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-07-15 11:25:01 +09:30
Cong Wang	c7e8e8a8f7	bridge: fix some kernel warning in multicast timer Several people reported the warning: "kernel BUG at kernel/timer.c:729!" and the stack trace is: #7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905 #8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge] #9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge] #10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge] #11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge] #12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc #13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6 #14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad #15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17 #16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68 #17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101 #18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8 #19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun] #20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun] #21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1 #22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe #23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f #24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1 #25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292 this is due to I forgot to check if mp->timer is armed in br_multicast_del_pg(). This bug is introduced by commit `9f00b2e7cf` (bridge: only expire the mdb entry when query is received). Same for __br_mdb_del(). Tested-by: poma <pomidorabelisima@gmail.com> Reported-by: LiYonghua <809674045@qq.com> Reported-by: Robert Hancock <hancockrwd@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-07-06 18:12:47 -07:00
Stephen Hemminger	537f7f8494	bridge: check for zero ether address in fdb add The check for all-zero ether address was removed from rtnetlink core, since Vxlan uses all-zero ether address to signify default address. Need to add check back in for bridge. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>	2013-06-25 16:59:27 -07:00

... 2 3 4 5 6 ...

1213 Commits