Make release_net/hold_net noop for performance-hungry people. This is a debug
staff and should be used in the debug mode only.
Add check for net != NULL in hold/release calls. This will be required
later on.
[ Added minor simplifications suggested by Brian Haley. -DaveM ]
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
And no need in some IPPROTO_XXX enabling, since ipv6 code
doesn't have any filtering.
So, just set proper net and mark device with NETNS_LOCAL.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the ip_route_output_key(), dev_get_by_...() and ipv6_chk_addr()
calls are now stubbed with init_net.
Fortunately, all the places already have where to get the proper
net from.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move hashes in the struct ip6_tnl_net, replace tnls_xxx[] with
ip6n->tnlx_xxx[] and handle init and exit appropriately.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the code, that reference it already has the ip6_tnl_net pointer,
so s/ip6_fb_tnl_dev/ip6n->fb_tnl_dev/ and move creation/releasing
code into net init/exit ops.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calls to ip6_tnl_lookup were stubbed with init_net - give them
a proper one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hashes and fallback device used in them will be per-net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes sit-generated traffic enter the namespace.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set proper net and mark a new device as NETNS_LOCAL before registering.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I.e. replace init_net stubs in ip_route_output_key() calls.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just move all the hashes on the sit_net structure and
patch the rest of the code appropriately.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate and register one in sit_init_net, use sitn->fb_tunnel_dev
over the code and unregister one in sit_exit_net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace introduced in the previous patch init_net stubs
with the proper net pointer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
... to make them prepared for future hashes and fallback device
move on the struct sit_net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one was also disabled by default for sanity.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I.e. set the proper net and mark as NETNS_LOCAL.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As for the IPIP tunnel, there are some ip_route_output_key()
calls in there that require a proper net so give one to them.
And a proper net for the __get_dev_by_index hanging around.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Very similar to what was done for the IPIP code.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Everything is prepared for this change now. Create on in
init callback, use it over the code and destroy on net exit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the part#2 of the patch #2 - get the proper net for
these functions. This change in a separate patch in order not
to get lost in a large previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fallback device and hashes are to become per-net, but many
code doesn't have anything to get the struct net pointer from.
So pass the proper net there with an extra argument.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the proper net before calling register_netdev and disable
the tunnel device netns changing.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some ip_route_output_key() calls in there that require
a proper net so give one to them.
Besides - give a proper net to a single __get_dev_by_index call
in ipip_tunnel_bind_dev().
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Either net or ipip_net already exists in all the required
places, so just use one.
Besides, tune net_init and net_exit calls to respectively
initialize the hashes and destroy devices.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the part#2 of the previous patch - get the proper
net for these functions.
I make it in a separate patch, so that this change does not
get lost in a large previous patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hashes of tunnels will be per-net too, so prepare all the
functions that uses them for this change by adding an argument.
Use init_net temporarily in places, where the net does not exist
explicitly yet.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create on in ipip_init_net(), use it all over the code (the
proper place to get the net from already exists) and destroy
in ipip_net_exit().
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When van device is moved to another namespace proc files,
related to this device, should also change one.
Use the netdev REGISTER and UNREGISTER event handlers for this.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is similar to what I've done for TUN - set the proper
net after device allocation and clean VLANs on net exit (use the
rtnl_kill_links helper finally).
Plus, drop explicit init_net usage and net != &init_net checks.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes moving one on the struct vlan_net and
s/vlan_name_type/vn->name_type/ over the code.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is created in a proper net, so make is show info, related
to this particular net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The proc_vlan_dir and proc_vlan_conf migrate on the struct
vlan_net and their creation uses the struct net.
The devices' entries use the corresponding device's net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
All proc files will be created in each net, so prepare them for
this change now, not to mess it with real creation patch.
The net != &init_net checks in them are for git-bisect sanity,
but I will drop them soon.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike TUN, it is empty from the very beginning, and will
be eventually populated later.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently vlan group is searched using one key - the ifindex.
We'll have to lookup the vlan_group by two keys - ifindex and
net. Turning the vlan_group lookup key to struct net_device
pointer will make this process easier.
Besides, this will eliminate one more place in the networking,
that assumes that indexes are unique in the kernel.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is responsible for calling ->dellink on each net
device found in net to help with vlan net_exit hook in the
nearest future.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each potential list_del (happening from inside a ->dellink call)
is followed by goto restart, so there's no need in _safe iteration.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed can only be more strict than what was checked by the
earlier common case check for non-tail skbs, thus
cwnd_len <= needed will never match in that case anyway.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Returns non-zero if tp->out_of_order_queue was seen non-empty.
This allows tcp_try_rmem_schedule() to return early.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, it is not possible to read/write to an eeprom larger than
128k in size because the buffer used for temporarily storing the
eeprom contents is allocated using kmalloc. kmalloc can only allocate
a maximum of 128k depending on architecture.
Modified ethtool_get/set_eeprom to only allocate a page of memory and
then copy the eeprom a page at a time.
Updated original patch as per suggestions from Joe Perches.
Signed-off-by: Mandeep Singh Baines <msb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of hrtimers to support high resolution capabilities, when
provided by the system clocksource.
The conversion to hrtimers additionally discovered and solved an
unlikely race condition that has been reproduced under (unrealistic)
massive receive load, which can only be produced on vcan software devices.
[ Fix printf format warnings on 64-bit -DaveM ]
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that TIPC properly handles incoming messages
that have incorrect or unexpected formats. Most significantly,
it now ensures that each sl_buff has at least as much data as
the message header indicates it should, and that the entire
message header is stored contiguously; this prevents TIPC from
accidentally accessing memory that is not part of the sk_buff.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows TIPC to process incoming messages that are
stored in a fragmented sk_buff, by forcing the linearization
of any such messages it receives.
Note: This is an interim solution to allow TIPC to operate with
Ethernet devices that generate non-linear buffers (such as the
gianfar driver), until such time as the rest of TIPC is enhanced
to handle sk_buffs with multiple data areas.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch causes TIPC to allocate fast clonable sk_buffs,
rather than standard ones. This speeds up the cloning
operation done by the link code each time a message is sent
off-node.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates a null pointer check when discarding a
TIPC message buffer, since kfree_skb() already handles this
situation.
Acknowledgements to Florian Westphal (fw@strlen.de> for
suggesting this enhancement.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some people are getting this message a lot, and we have traced it to
broken access points that much too often send completely empty frames
(all bytes zeroed, which they shouldn't do at all.)
Since we cannot do anything about such frames in any case except the
special case where we're debugging an AP, just remove the message.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rfkill_switch_all() is supposed to only switch all the interfaces of a
given type, but does not actually do this; instead, it just switches
everything currently in the same state.
Add the necessary type check in.
(This fixes a bug I've been seeing while developing an rfkill laptop
driver, with both bluetooth and wireless simultaneously changing state
after only pressing either KEY_WLAN or KEY_BLUETOOTH).
Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add the elastic array of void * pointer to the struct net.
The access rules are simple:
1. register the ops with register_pernet_gen_device to get
the id of your private pointer
2. call net_assign_generic() to put the private data on the
struct net (most preferably this should be done in the
->init callback of the ops registered)
3. do not store any private reference on the net_generic array;
4. do not change this pointer while the net is alive;
5. use the net_generic() to get the pointer.
When adding a new pointer, I copy the old array, replace it
with a new one and schedule the old for kfree after an RCU
grace period.
Since the net_generic explores the net->gen array inside rcu
read section and once set the net->gen->ptr[x] pointer never
changes, this grants us a safe access to generic pointers.
Quoting Paul: "... RCU is protecting -only- the net_generic
structure that net_generic() is traversing, and the [pointer]
returned by net_generic() is protected by a reference counter
in the upper-level struct net."
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make some per-net generic pointers, we need some way to address
them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
subsystems.
Addressing questions about potential checkpoint/restart problems:
these IDs are "lite-offsets" within the net structure and are by no
means supposed to be exported to the userspace.
Since it will be used in the nearest future by devices only (tun,
vlan, tunnels, bridge, etc), I make it resemble the functionality
of register_pernet_device().
The new ids is stored in the *id pointer _before_ calling the init
callback to make this id available in this callback.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_prune_queue() doesn't prune an out-of-order queue at all.
Therefore sk_rmem_schedule() can fail but the out-of-order queue isn't
pruned . This can lead to tcp deadlock state if the next two
conditions are held:
1. There are a sequence hole between last received in
order segment and segments enqueued to the out-of-order queue.
2. Size of all segments in the out-of-order queue is more than tcp_mem[2].
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even kernel 2.2.26 (sic) already contains the
#undef CONFIG_IRLAN_SEND_GRATUITOUS_ARP
with the comment "but for some reason the machine crashes if you use DHCP".
Either someone finally looks into this or it's simply time to remove
this dead code.
Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies TIPC's socket code to follow the same approach
used by other protocols. This change eliminates the need for a
mutex in the TIPC-specific portion of the socket protocol data
structure -- in its place, the standard Linux socket backlog queue
and associated locking routines are utilized. These changes fix
a long-standing receive queue bug on SMP systems, and also enable
individual read and write threads to utilize a socket without
unnecessarily interfering with each other.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes TIPC's connect routine to conform to Linux
kernel style norms of indentation, line length, etc.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch causes TIPC to return an error indication if the non-
blocking form of connect() is requested (which TIPC does not yet
support).
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a bug that allowed TIPC to queue 1 more message
than allowed by the socket receive queue threshold limits. The
patch also improves the threshold code's logic and naming to help
prevent this sort of error from recurring in the future.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that padding bytes appearing at the end of
an incoming TIPC message are not returned as valid stream data.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows a stream socket to receive data from multiple
TIPC messages in its receive queue, without requiring the use of
the MSG_WAITALL flag.
Acknowledgements to Florian Westphal <fw-tipc@strlen.de> for
identifying this issue and suggesting how to correct it.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch optimizes the receive path for SOCK_DGRAM and SOCK_RDM
messages by skipping over code that handles connection-based flow
control.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TC_H_MAJ(parentid) for root classes is the same as for ingress, and if
ingress qdisc is created qdisc_lookup() returns its pointer (without
ingress NULL is returned). After this all qdisc_lookups give the same,
and we get endless loop. (I don't know how this could hide for so long
- it should trigger with every leaf class deleted if it's qdisc isn't
empty.)
After this fix qdisc_lookup() is omitted both for ingress and root
parents, but looking for root is only wasting a little time here...
Many thanks to Enrico Demarin for finding a test for catching this
bug, which probably bothered quite a lot of admins.
Reported-by: Enrico Demarin <enrico@superclick.com>,
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_SECURITY_NETWORK_XFRM is undefined the following warnings appears:
net/xfrm/xfrm_user.c: In function 'xfrm_add_pol_expire':
net/xfrm/xfrm_user.c:1576: warning: 'ctx' may be used uninitialized in this function
net/xfrm/xfrm_user.c: In function 'xfrm_get_policy':
net/xfrm/xfrm_user.c:1340: warning: 'ctx' may be used uninitialized in this function
(security_xfrm_policy_alloc is noop for the case).
It seems that they are result of the commit
03e1ad7b5d ("LSM: Make the Labeled IPsec
hooks more stack friendly")
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
[BRIDGE]: Fix crash in __ip_route_output_key with bridge netfilter
[NETFILTER]: ipt_CLUSTERIP: fix race between clusterip_config_find_get and _entry_put
[IPV6] ADDRCONF: Don't generate temporary address for ip6-ip6 interface.
[IPV6] ADDRCONF: Ensure disabling multicast RS even if privacy extensions are disabled.
[IPV6]: Use appropriate sock tclass setting for routing lookup.
[IPV6]: IPv6 extension header structures need to be packed.
[IPV6]: Fix ipv6 address fetching in raw6_icmp_error().
[NET]: Return more appropriate error from eth_validate_addr().
[ISDN]: Do not validate ISDN net device address prior to interface-up
[NET]: Fix kernel-doc for skb_segment
[SOCK] sk_stamp: should be initialized to ktime_set(-1L, 0)
net: check for underlength tap writes
net: make struct tun_struct private to tun.c
[SCTP]: IPv4 vs IPv6 addresses mess in sctp_inet[6]addr_event.
[SCTP]: Fix compiler warning about const qualifiers
[SCTP]: Fix protocol violation when receiving an error lenght INIT-ACK
[SCTP]: Add check for hmac_algo parameter in sctp_verify_param()
[NET_SCHED] cls_u32: refcounting fix for u32_delete()
[DCCP]: Fix skb->cb conflicts with IP
[AX25]: Potential ax25_uid_assoc-s leaks on module unload.
...
I was asked about "why don't we perform a sk_net filtering in
bind_conflict calls, like we do in other sock lookup places"
for a couple of times.
Can we please add a comment about why we do not need one?
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These sockets now have a bit other names and are no longer global.
Shame on me, I haven't provided a good comment for this when
sending DCCP netnsization patches.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the ebtables nflog watcher to the kernel in order to
allow ebtables log through the nfnetlink_log backend.
Signed-off-by: Peter Warasin <peter@endian.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Directly call IPv4 and IPv6 variants where the address family is
easily known.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Invalid states can cause out-of-bound memory accesses of the state table.
Also don't insist on having a new state contained in the netlink message.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Connection tracking helpers (specifically FTP) need to be called
before NAT sequence numbers adjustments are performed to be able
to compare them against previously seen ones. We've introduced
two new hooks around 2.6.11 to maintain this ordering when NAT
modules were changed to get called from conntrack helpers directly.
The cost of netfilter hooks is quite high and sequence number
adjustments are only rarely needed however. Add a RCU-protected
sequence number adjustment function pointer and call it from
IPv4 conntrack after calling the helper.
Signed-off-by: Patrick McHardy <kaber@trash.net>
New extensions may only be added to unconfirmed conntracks to avoid races
when reallocating the storage.
Also change NF_CT_ASSERT to use WARN_ON to get backtraces.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Adding extensions to confirmed conntracks is not allowed to avoid races
on reallocation. Don't setup NAT for confirmed conntracks in case NAT
module is loaded late.
The has one side-effect, the connections existing before the NAT module
was loaded won't enter the bysource hash. The only case where this actually
makes a difference is in case of SNAT to a multirange where the IP before
NAT is also part of the range. Since old connections don't enter the
bysource hash the first new connection from the IP will have a new address
selected. This shouldn't matter at all.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Locally generated ICMP packets have a reference to the conntrack entry
of the original packet manually attached by icmp_send(). Therefore the
check for locally originated untracked ICMP redirects can never be
true.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move the UDP-Lite conntrack checksum validation to a generic helper
similar to nf_checksum() and make it fall back to nf_checksum()
in case the full packet is to be checksummed and hardware checksums
are available. This is to be used by DCCP conntrack, which also
needs to verify partial checksums.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move responsibility for setting the IP_NAT_RANGE_PROTO_SPECIFIED flag
to the NAT protocol, properly propagate errors and get rid of ugly
return value convention.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Move to nf_nat_proto_common and rename to nf_nat_proto_... since they're
also used by protocols that don't have port numbers.
Signed-off-by: Patrick McHardy <kaber@trash.net>
The port rover should not get overwritten when using random mode,
otherwise other rules will also use more or less random ports.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Rule dumping is performed in two steps: first userspace gets the
ruleset size using getsockopt(SO_GET_INFO) and allocates memory,
then it calls getsockopt(SO_GET_ENTRIES) to actually dump the
ruleset. When another process changes the ruleset in between the
sizes from the first getsockopt call doesn't match anymore and
the kernel aborts. Unfortunately it returns EAGAIN, as for multiple
other possible errors, so userspace can't distinguish this case
from real errors.
Return EAGAIN so userspace can retry the operation.
Fixes (with current iptables SVN version) netfilter bugzilla #104.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Some callers pass uninitialized structures, clear the address to make
sure later comparisions work properly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Commit 9335f047fe aka
"[NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW"
added per-netns _view_ of iptables rules. They were shown to user, but
ignored by filtering code. Now that it's possible to at least ping loopback,
per-netns tables can affect filtering decisions.
netns is taken in case of
PRE_ROUTING, LOCAL_IN -- from in device,
POST_ROUTING, LOCAL_OUT -- from out device,
FORWARD -- from in device which should be equal to out device's netns.
This code is relatively new, so BUG_ON was plugged.
Wrappers were added to a) keep code the same from CONFIG_NET_NS=n users
(overwhelming majority), b) consolidate code in one place -- similar
changes will be done in ipv6 and arp netfilter code.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Dump the mark value in log messages similar to nfnetlink_log. This
is useful for debugging complex setups where marks are used for
routing or traffic classification.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Patch splits creation of /proc/net/nf_conntrack, /proc/net/stat/nf_conntrack
and net.netfilter hierarchy into their own functions with dummy ones
if PROC_FS or SYSCTL is not set. Also, remove dead "ret = 0" write
while I'm at it.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The bridge netfilter code attaches a fake dst_entry with a pointer to a
fake net_device structure to skbs it passes up to IPv4 netfilter. This
leads to crashes when the skb is passed to __ip_route_output_key when
dereferencing the namespace pointer.
Since bridging can currently only operate in the init_net namespace,
the easiest fix for now is to initialize the nd_net pointer of the
fake net_device struct to &init_net.
Should fix bugzilla 10323: http://bugzilla.kernel.org/show_bug.cgi?id=10323
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consider we are putting a clusterip_config entry with the "entries"
count == 1, and on the other CPU there's a clusterip_config_find_get
in progress:
CPU1: CPU2:
clusterip_config_entry_put: clusterip_config_find_get:
if (atomic_dec_and_test(&c->entries)) {
/* true */
read_lock_bh(&clusterip_lock);
c = __clusterip_config_find(clusterip);
/* found - it's still in list */
...
atomic_inc(&c->entries);
read_unlock_bh(&clusterip_lock);
write_lock_bh(&clusterip_lock);
list_del(&c->list);
write_unlock_bh(&clusterip_lock);
...
dev_put(c->dev);
Oops! We have an entry returned by the clusterip_config_find_get,
which is a) not in list b) has a stale dev pointer.
The problems will happen when the CPU2 will release the entry - it
will remove it from the list for the 2nd time, thus spoiling it, and
will put a stale dev pointer.
The fix is to make atomic_dec_and_test under the clusterip_lock.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This expresses __skb_append in terms of __skb_queue_after, exploiting that
__skb_append(old, new, list) = __skb_queue_after(list, old, new).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patches adds a call to increment IPSTATS_MIB_OUTFORWDATAGRAMS
when forwarding the packet in ip6_mr_forward() in the IPv6 multicast
routing module (net/ipv6/ip6mr.c).
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As far as I can remember, I was going to disable privacy extensions
on all "tunnel" interfaces. Disable it on ip6-ip6 interface as well.
Also, just remove ifdefs for SIT for simplicity.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since NETDEV_REGISTER notifier chain is responsible for creating
inet6_dev{}, we do not need to call ipv6_find_idev() directly here.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes kernel bugzilla 10437
Based almost entirely upon a patch by Dmitry Butskoy.
When deciding what raw sockets to deliver the ICMPv6
to, we should use the addresses in the ICMPv6 quoted
IPV6 header, not the top-level one.
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Bolle wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9923 would have been much easier to
> track down if eth_validate_addr() would somehow complain aloud if an address
> is invalid. Shouldn't it make at least some noise?
I guess it should return -EADDRNOTAVAIL similar to eth_mac_addr()
when validation fails.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inet6_lookup family of functions requires a net to lookup
a socket in, so give a proper one to them.
No more things to do for dccpv6, since routing is OK and the
ipv4-like transport layer filtering is not done for ipv6.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the call to inet_ctl_sock_create to init callback (and
inet_ctl_sock_destroy to exit one) and use proper ctl sock
in dccp_v6_ctl_send_reset.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And replace all its usage with init_net's socket.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
They will be responsible for ctl socket initialization, but
currently they are void.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This call uses the sock to get the net to lookup the routing
in. With CONFIG_NET_NS this code will OOPS, since the sk ptr
is NULL.
After looking inside the ip6_dst_lookup and drawing the analogy
with respective ipv6 code, it seems, that the dccp ctl socket
is a good candidate for the first argument.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This enables sockets creation with IPPROTO_DCCP and enables
the ip level to pass DCCP packets to the DCCP level.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inet_lookup family of functions requires a net to lookup
a socket in, so give a proper one to them.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dccp_v4_route_skb used in dccp_v4_ctl_send_reset, currently
works with init_net's routing tables - fix it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the call to inet_ctl_sock_create to init callback (and
inet_ctl_sock_destroy to exit one) and use proper ctl sock
in dccp_v4_ctl_send_reset.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And replace all its usage with init_net's socket.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
They will be responsible for ctl socket initialization, but
currently they are void.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace seq_open with seq_open_net and remove tcp_seq_release
completely. seq_release_net will do this job just fine.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to create seq_operations for each instance of 'netstat'.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_proc_register/tcp_proc_unregister are called with a static pointer only.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel-doc comment for skb_segment is clearly wrong. This states
what it actually does.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_reuse is declared as "unsigned char", but is set as type valbool in net/core/sock.c.
There is no other place in net/ where sk->sk_reuse is set to a value > 1, so the test
"sk_reuse > 1" can not be true.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies TIPC's socket code to use standard kernel
routines to handle time conversions between jiffies and ms.
This ensures proper operation even when HZ isn't 1000.
Acknowledgements to Eric Sesterhenn <snakebyte@gmx.de> for
identifying this issue and proposing a solution.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates re-initialization of the standard socket
wait queue used for sleeping in TIPC's socket creation code.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xfrm_get_policy() and xfrm_add_pol_expire() put some rather large structs
on the stack to work around the LSM API. This patch attempts to fix that
problem by changing the LSM API to require only the relevant "security"
pointers instead of the entire SPD entry; we do this for all of the
security_xfrm_policy*() functions to keep things consistent.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'asoc' parameter to sctp_cmd_hb_timer_update() is unused, and
we can remove it.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replacing (almost) all invocations of list_for_each() with
list_for_each_entry() tightens up the code and allows for the deletion
of numerous list iterator variables that are no longer necessary.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently I posted a patch to add some informational items to
/proc/net/sctp/assocs. All the information is correct, but because
of how the seqfile show operation is laid out, some of the formatting
is backwards. This patch corrects that formatting, so that the new
information appears at the end of each line, rather than in the middle.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All IP addresses that are present in a system are duplicated on
struct sctp_sockaddr_entry. They are linked in the global list
called sctp_local_addr_list. And this struct unions IPv4 and IPv6
addresses.
So, there can be rare case, when a sockaddr_in.sin_addr coincides
with the corresponding part of the sockaddr_in6 and the notifier
for IPv4 will carry away an IPv6 entry.
The fix is to check the family before comparing the addresses.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 3 warnings about discarding const qualifiers:
net/sctp/ulpevent.c:862: warning: passing argument 1 of 'sctp_event2skb' discards qualifiers from pointer target type
net/sctp/sm_statefuns.c:4393: warning: passing argument 1 of 'SCTP_ASOC' discards qualifiers from pointer target type
net/sctp/socket.c:5874: warning: passing argument 1 of 'cmsg_nxthdr' discards qualifiers from pointer target type
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving an error length INIT-ACK during COOKIE-WAIT,
a 0-vtag ABORT will be responsed. This action violates the
protocol apparently. This patch achieves the following things.
1 If the INIT-ACK contains all the fixed parameters, use init-tag
recorded from INIT-ACK as vtag.
2 If the INIT-ACK doesn't contain all the fixed parameters,
just reflect its vtag.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4890 has the following text:
The HMAC algorithm based on SHA-1 MUST be supported and
included in the HMAC-ALGO parameter.
As a result, we need to check in sctp_verify_param() that HMAC_SHA1 is
present in the list. If not, we should probably treat this as a
protocol violation.
It should also be a protocol violation if the HMAC parameter is empty.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Deleting of nonroot hnodes mostly doesn't work in u32_delete():
refcnt == 1 is expected, but such hnodes' refcnts are initialized
with 0 and charged only with "link" nodes. Now they'll start with
1 like usual. Thanks to Patrick McHardy for an improving suggestion.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_queue_xmit() and the other IP output functions expect to get a skb
with clear or properly initialized skb->cb. Unlike TCP and UDP, the
dccp_skb_cb doesn't contain a struct inet_skb_parm at the beginning,
so the DCCP-specific data is interpreted by the IP output functions.
This can cause false negatives for the conditional POST_ROUTING hook
invocation, making the packet bypass the hook.
Add a inet_skb_parm/inet6_skb_parm union to the beginning of
dccp_skb_cb to avoid clashes. Also add a BUILD_BUG_ON to make
sure it fits in the cb.
[ Combined with patch from Gerrit Renker to remove two now unnecessary
memsets of IPCB(skb)->opt ]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ax25_uid_free call walks the ax25_uid_list and releases entries
from it. The problem is that after the fisrt call to hlist_del_init
the hlist_for_each_entry (which hides behind the ax25_uid_for_each)
will consider the current position to be the last and will return.
Thus, the whole list will be left not freed.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a difference between IPv4 and IPv6 when sending packets
to the unspecified address (either 0.0.0.0 or ::) when using raw or
un-connected UDP sockets. There are two cases where IPv6 either fails
to send anything, or sends with the destination address set to ::. For
example:
--> ping -c1 0.0.0.0
PING 0.0.0.0 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.032 ms
--> ping6 -c1 ::
PING ::(::) 56 data bytes
ping: sendmsg: Invalid argument
Doing a sendto("0.0.0.0") reveals:
10:55:01.495090 IP localhost.32780 > localhost.7639: UDP, length 100
Doing a sendto("::") reveals:
10:56:13.262478 IP6 fe80::217:8ff:fe7d:4718.32779 > ::.7639: UDP, length 100
If you issue a connect() first in the UDP case, it will be sent to ::1,
similar to what happens with TCP.
This restores the BSD-ism.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
- This patch adjusts IPv6 multicast routing module, net/ipv6/ip6mr.c,
to use mroute6 header definitions instead of mroute.
(MFC6_LINES instead of MFC_LINES, MAXMIFS instead of MAXVIFS, mifi_t
instead of vifi_t.)
- In addition, inclusion of some headers was removed as it is not needed.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Check length of setsockopt's optval, which provided by user, before copy it
from user space.
For POSIX compliant, return -EINVAL for setsockopt of short lengths.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
MIP6_OPT_PAD_X are actually for paddings in destination
option header. Replace them with our standard IPV6_TLV_PADX.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* 'docs' of git://git.lwn.net/linux-2.6:
Add additional examples in Documentation/spinlocks.txt
Move sched-rt-group.txt to scheduler/
Documentation: move rpc-cache.txt to filesystems/
Documentation: move nfsroot.txt to filesystems/
Spell out behavior of atomic_dec_and_lock() in kerneldoc
Fix a typo in highres.txt
Fixes to the seq_file document
Fill out information on patch tags in SubmittingPatches
Add the seq_file documentation
Documentation/ is a little large, and filesystems/ seems an obvious
place for this file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
[NETNS][IPV6] tcp - assign the netns for timewait sockets
[IPV4]: Fix byte value boundary check in do_ip_getsockopt().
BNX2X: Correct bringing chip out of reset
[NETFILTER]: nf_nat: autoload IPv4 connection tracking
[NETFILTER]: xt_hashlimit: fix mask calculation
[XFRM]: xfrm_user: fix selector family initialization
rt61pci: rt61pci_beacon_update do not free skb twice
ssb-mipscore: Fix interrupt vectors
ssb-pcicore: Fix IRQ TPS flag handling
mac80211: use short_preamble mode from capability if ERP IE not present
[NET]: Undo code bloat in hot paths due to print_mac().
[TCP]: Don't allow FRTO to take place while MTU is being probed
[TCP]: tcp_simple_retransmit can cause S+L
[TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb
[TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack
nl80211: fix STA AID bug
b43legacy: fix bcm4303 crash
iwlwifi: fix n-band association problem
ipw2200: set MAC address on radiotap interface
libertas: fix mode initialization problem
| net/ipv6/ipv6_sockglue.c:162:16: warning: symbol 'net' shadows an earlier one
| net/ipv6/ipv6_sockglue.c:111:13: originally declared here
| net/ipv6/ipv6_sockglue.c:175:16: warning: symbol 'net' shadows an earlier one
| net/ipv6/ipv6_sockglue.c:111:13: originally declared here
| net/ipv6/ip6mr.c:1241:10: warning: symbol 'ret' shadows an earlier one
| net/ipv6/ip6mr.c:1163:6: originally declared here
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Copy the network namespace from the socket to the timewait socket.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Mark Lord <mlord@pobox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The comparison in ip_route_input is a hot path, by recoding the C
"and" as bit operations, fewer conditional branches get generated
so the code should be faster. Maybe someday Gcc will be smart
enough to do this?
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid unneeded test in the case where object to be freed
has to be a leaf. Don't need to use the generic tnode_free()
function, instead just setup leaf to be freed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The trie pointer is passed down to flush_list and flush_leaf
but never used.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the use of SACK and window scaling when syncookies are used
and the client supports tcp timestamps. Options are encoded into
the timestamp sent in the syn-ack and restored from the timestamp
echo when the ack is received.
Based on earlier work by Glenn Griffin.
This patch avoids increasing the size of structs by encoding TCP
options into the least significant bits of the timestamp and
by not using any 'timestamp offset'.
The downside is that the timestamp sent in the packet after the synack
will increase by several seconds.
changes since v1:
don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
and have ipv6/syncookies.c use it.
Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()
Reviewed-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use vmalloc rather than alloc_pages to avoid wasting memory.
The problem is that tnode structure has a power of 2 sized array,
plus a header. So the current code wastes almost half the memory
allocated because it always needs the next bigger size to hold
that small header.
This is similar to an earlier patch by Eric, but instead of a list
and lock, I used a workqueue to handle the fact that vfree can't
be done in interrupt context.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we register the iucv bus after the infrastructure is ready,
userspace can start relying on it when it receives the uevent
for the bus.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This BUG_ON is not needed, since all (debug) checks are also done
in smp_call_function() which gets called by this function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SKF_ADF_NLATTR searches for a netlink attribute, which avoids manually
parsing and walking attributes. It takes the offset at which to start
searching in the 'A' register and the attribute type in the 'X' register
and returns the offset in the 'A' register. When the attribute is not
found it returns zero.
A top-level attribute can be located using a filter like this
(example for nfnetlink, using struct nfgenmsg):
...
{
/* A = offset of first attribute */
.code = BPF_LD | BPF_IMM,
.k = sizeof(struct nlmsghdr) + sizeof(struct nfgenmsg)
},
{
/* X = CTA_PROTOINFO */
.code = BPF_LDX | BPF_IMM,
.k = CTA_PROTOINFO,
},
{
/* A = netlink attribute offset */
.code = BPF_LD | BPF_B | BPF_ABS,
.k = SKF_AD_OFF + SKF_AD_NLATTR
},
{
/* Exit if not found */
.code = BPF_JMP | BPF_JEQ | BPF_K,
.k = 0,
.jt = <error>
},
...
A nested attribute below the CTA_PROTOINFO attribute would then
be parsed like this:
...
{
/* A += sizeof(struct nlattr) */
.code = BPF_ALU | BPF_ADD | BPF_K,
.k = sizeof(struct nlattr),
},
{
/* X = CTA_PROTOINFO_TCP */
.code = BPF_LDX | BPF_IMM,
.k = CTA_PROTOINFO_TCP,
},
{
/* A = netlink attribute offset */
.code = BPF_LD | BPF_B | BPF_ABS,
.k = SKF_AD_OFF + SKF_AD_NLATTR
},
...
The data of an attribute can be loaded into 'A' like this:
...
{
/* X = A (attribute offset) */
.code = BPF_MISC | BPF_TAX,
},
{
/* A = skb->data[X + k] */
.code = BPF_LD | BPF_B | BPF_IND,
.k = sizeof(struct nlattr),
},
...
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should not count it if the allocation of the object
is failed.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should not count it if the allocation of this object
failed.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No urgency on the rehash interval timer, so mark it as deferrable.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since route hash is a triple, use jhash_3words rather doing the mixing
directly. This should be as fast and give better distribution.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't mark functions that are large as inline, let compiler decide.
Also, use inline rather than __inline__.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sk_filter function is too big to be inlined. This saves 2296 bytes
of text on allyesconfig.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some minor style cleanups:
* Move __KERNEL__ definitions to one place in filter.h
* Use const for sk_filter_len
* Line wrapping
* Put EXPORT_SYMBOL next to function definition
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes kernel bugzilla 10371.
As reported by M.Piechaczek@osmosys.tv, if we try to grab a
char sized socket option value, as in:
unsigned char ttl = 255;
socklen_t len = sizeof(ttl);
setsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len);
getsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len);
The ttl returned will be wrong on big-endian, and on both little-
endian and big-endian the next three bytes in userspace are written
with garbage.
It's because of this test in do_ip_getsockopt():
if (len < sizeof(int) && len > 0 && val>=0 && val<255) {
It should allow a 'val' of 255 to pass here, but it doesn't so it
copies a full 'int' back to userspace.
On little-endian that will write the correct value into the location
but it spams on the next three bytes in userspace. On big endian it
writes the wrong value into the location and spams the next three
bytes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this patch, the generic L3 tracker would kick in
if nf_conntrack_ipv4 was not loaded before nf_nat, which
would lead to translation problems with ICMP errors.
NAT does not make sense without IPv4 connection tracking
anyway, so just add a call to need_ipv4_conntrack().
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shifts larger than the data type are undefined, don't try to shift
an u32 by 32. Also remove some special-casing of bitmasks divisible
by 32.
Based on patch by Jan Engelhardt <jengelh@computergmbh.de>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit df9dcb45 ([IPSEC]: Fix inter address family IPsec tunnel handling)
broke openswan by removing the selector initialization for tunnel mode
in case it is uninitialized.
This patch restores the initialization, fixing openswan, but probably
breaking inter-family tunnels again (unknown since the patch author
disappeared). The correct thing for inter-family tunnels is probably
to simply initialize the selector family explicitly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When associating to a b-only AP where there is no ERP IE, short preamble
mode is left at previous state (probably also protection mode). In this
case, disable protection and use short preamble mode as specified in
capability field. The same is done if capability field is changed on-the-fly.
Signed-off-by: Vladimir Koutny <vlado@ksp.sk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Commit 510deb0d was supposed to move the xprt_create_transport() call in
rpc_create(), but neglected to remove the old call site. This resulted in
a transport leak after every rpc_create() call.
This leak is present in 2.6.24 and 2.6.25.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix a problem in _copy_to_pages(), whereby it may call flush_dcache_page()
with an invalid pointer due to the fact that 'pgto' gets incremented
beyond the end of the page array. Fix is to exit the loop without this
unnecessary increment of pgto.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If print_mac() is used inside of a pr_debug() the compiler
can't see that the call is redundant so still performs it
even of pr_debug() ends up being a nop.
So don't use print_mac() in such cases in hot code paths,
use MAC_FMT et al. instead.
As noted by Joe Perches, pr_debug() could be modified to
handle this better, but that is a change to an interface
used by the entire kernel and thus needs to be validated
carefully. This here is thus the less risky fix for
2.6.25
Signed-off-by: David S. Miller <davem@davemloft.net>
The default_key symlink points to the key index rather than
they key counter, fix it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch renames all mac80211 files (except ieee80211_i.h) to get rid
of the useless ieee80211_ prefix.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Up to now, key manipulation is supposed to run under RTNL to
avoid concurrent manipulations and also allow the set_key()
hardware callback to sleep. This is not feasible because STA
structs are rcu-protected and thus a lot of operations there
cannot take the RTNL. Also, key references are rcu-protected
so we cannot do things atomically.
This patch changes key locking completely:
* key operations are now atomic
* hardware crypto offload is enabled and disabled from
a workqueue, due to that key freeing is also delayed
* debugfs code is also run from a workqueue
* keys reference STAs (and vice versa!) so during STA
unlink the STAs key reference is removed but not the
keys STA reference, to avoid races key todo work is
run before STA destruction.
* fewer STA operations now need the RTNL which was
required due to key operations
This fixes the locking problems lockdep pointed out and also
makes things more light-weight because the rtnl isn't required
as much.
Note that the key todo lock/key mutex are global locks, this
is not required, of course, they could be per-hardware instead.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When a STA is supposed to be unlinked but is pinned, it still needs
to be unlinked from all structures. Only at the end of the unlink
process should we check for pin status and invalidate the callers
reference if it is pinned. Move the pin status check down.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These two symbols are used only in ifdeffed function. Move them to that
section too.
net/mac80211/sta_info.c:387: warning: `__sta_info_pin' defined but not used
net/mac80211/sta_info.c:397: warning: `__sta_info_unpin' defined but not used
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Michael Wu <flamingice@sourmilk.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch contains next issues:
1 - prevents "stop BA session" multiple warnings
2 - adds debug print to stop Rx BA session flow
3 - adds EOL in one debug print
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add new API to MAC80211 to allow low level driver to
notify MAC with driver status.
Signed-off-by: Mohamed Abbas <mabbas@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The ieee80211_ioctl_giwrate() ioctl handler doesn't rcu_read_lock()
its access to the sta table, fix it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Unfortunately, debugfs can be made to access invalid memory by
open()ing a file and then waiting until the corresponding debugfs
file has been removed (and, probably, the underlying object.)
That could be exploited by any user if the user is able to open
debugfs files and can cause networking devices, STA entries or
similar to disappear which is quite easy to do.
Hence, all debugfs files should be root-only.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When drivers receive change notification they may do work that
will enable the changes to take effect. For example, if new association
the device needs to be programmed with this information.
Give the driver chance to make the changes before notifying the
upper layer - thus preventing race condition where upper layer
attempts to utilize state that may not be configured yet.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Really doesn't need to be defined four times.
Also, while at it, remove a useless macro (IEEE80211_ALIGN32_PAD)
and a function prototype for a function we don't actually have
(ieee80211_set_compression.)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Because we queue the sta-debugfs-adding work on our mac80211
workqueue (which needs to be flushed under RTNL) and that work
needs the RTNL, it can currently deadlock, thanks to Reinette
Chatre for pointing out the lockdep warning about this.
This patch fixes it by moving this work to the common kernel
workqueue (using schedule_work) and canceling it as appropriate.
It also fixes a related problem: When a STA is pinned by the
debugfs adding work and sta_info_flush() runs concurrently
it is not guaranteed that all STAs are removed from the driver
before the corresponding interface is removed which may lead
to bugs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch is necessary for the upcoming Accesspoint patch for p54.
Signed-off-by: Christian Lamparter <chunkeey@web.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds assocation capability, timestamp (tsf) and beacon interval
to bss_conf. This is required for successful assocation of iwlwifi drivers
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch eliminates the use of conf_ht, replacing it with
bss_info_changed.
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit 6c4711b469.
That patch breaks mesh config comparison between beacons/probe reponses, so
every beacon from a mesh network would be added as a new bss. Since the
comparison has to be performed for every received beacon I believe it is best to
save the mesh config in a format easy to compare, rather than do a bunch of
unaligned accesses to compare field by field.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
MTU probe can cause some remedies for FRTO because the normal
packet ordering may be violated allowing FRTO to make a wrong
decision (it might not be that serious threat for anything
though). Thus it's safer to not run FRTO while MTU probe is
underway.
It seems that the basic FRTO variant should also look for an
skb at probe_seq.start to check if that's retransmitted one
but I didn't implement it now (plain seqno in window check
isn't robust against wraparounds).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes Bugzilla #10384
tcp_simple_retransmit does L increment without any checking
whatsoever for overflowing S+L when Reno is in use.
The simplest scenario I can currently think of is rather
complex in practice (there might be some more straightforward
cases though). Ie., if mss is reduced during mtu probing, it
may end up marking everything lost and if some duplicate ACKs
arrived prior to that sacked_out will be non-zero as well,
leading to S+L > packets_out, tcp_clean_rtx_queue on the next
cumulative ACK or tcp_fastretrans_alert on the next duplicate
ACK will fix the S counter.
More straightforward (but questionable) solution would be to
just call tcp_reset_reno_sack() in tcp_simple_retransmit but
it would negatively impact the probe's retransmission, ie.,
the retransmissions would not occur if some duplicate ACKs
had arrived.
So I had to add reno sacked_out reseting to CA_Loss state
when the first cumulative ACK arrives (this stale sacked_out
might actually be the explanation for the reports of left_out
overflows in kernel prior to 2.6.23 and S+L overflow reports
of 2.6.24). However, this alone won't be enough to fix kernel
before 2.6.24 because it is building on top of the commit
1b6d427bb7 ([TCP]: Reduce sacked_out with reno when purging
write_queue) to keep the sacked_out from overflowing.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes a long-standing bug which makes NewReno recovery crippled.
With GSO the whole head skb was marked as LOST which is in
violation of NewReno procedure that only wants to mark one packet
and ended up breaking our TCP code by causing counter overflow
because our code was built on top of assumption about valid
NewReno procedure. This manifested as triggering a WARN_ON for
the overflow in a number of places.
It seems relatively safe alternative to just do nothing if
tcp_fragment fails due to oom because another duplicate ACK is
likely to be received soon and the fragmentation will be retried.
Special thanks goes to Soeren Sonnenburg <kernel@nn7.de> who was
lucky enough to be able to reproduce this so that the warning
for the overflow was hit. It's not as easy task as it seems even
if this bug happens quite often because the amount of outstanding
data is pretty significant for the mismarkings to lead to an
overflow.
Because it's very late in 2.6.25-rc cycle (if this even makes in
time), I didn't want to touch anything with SACK enabled here.
Fragmenting might be useful for it as well but it's more or less
a policy decision rather than mandatory fix. Thus there's no need
to rush and we can postpone considering tcp_fragment with SACK
for 2.6.26.
In 2.6.24 and earlier, this very same bug existed but the effect
is slightly different because of a small changes in the if
conditions that fit to the patch's context. With them nothing
got lost marker and thus no retransmissions happened.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fast retransmission can be forced locally to the rfc3517
branch in tcp_update_scoreboard instead of making such fragile
constructs deeper in tcp_mark_head_lost.
This is necessary for the next patch which must not have
loopholes for cnt > packets check. As one can notice,
readability got some improvements too because of this :-).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the STA AID setting and actually makes hostapd/mac80211
work properly in presence of power-saving stations.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
fix endian lossage in forcedeth
net/tokenring/olympic.c section fixes
net: marvell.c fix sparse shadowed variable warning
[VLAN]: Fix egress priority mappings leak.
[TG3]: Add PHY workaround for 5784
[NET]: srandom32 fixes for networking v2
[IPV6]: Fix refcounting for anycast dst entries.
[IPV6]: inet6_dev on loopback should be kept until namespace stop.
[IPV6]: Event type in addrconf_ifdown is mis-used.
[ICMP]: Ensure that ICMP relookup maintains status quo
This bug resulted in compilation error on 64bit machines.
Pointed out by Rami Rosen <roszenrami@gmail.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
These entries are allocated in vlan_dev_set_egress_priority,
but are never released and leaks on vlan device removal.
Drop these in vlan's ->uninit callback - after the device is
brought down and everyone is notified about it is going to
be unregistered.
Found during testing vlan netnsization patchset.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have been using __NET_IPV6_MAX for adjusting the size of array
for sysctl table, but it does not work any longer because of the
deprecation of NET_IPV6_xxx constants. Let's use DEVCONF_MAX
instead.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Do this by replacing sock_create_kern with inet_ctl_sock_create.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
uc_ttl is initialized in inet(6)_create and never changed except
setsockopt ioctl. Remove this assignment.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace sock_create_kern with inet_ctl_sock_create.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a generic requirement, so make inet_ctl_sock_create namespace
aware and create a inet_ctl_sock_destroy wrapper around
sk_release_kernel.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All upper protocol layers are already use sock internally.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_proc->(un)hash is noop right now, so the unification is correct.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This call is nothing common with INET connection sockets code. It
simply creates an unhashes kernel sockets for protocol messages.
Move the new call into af_inet.c after the rename.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This seems a purism as module can't be unloaded, but though if cleanup
method is present it should be correct and clean all staff created.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace dccp_v(4|6)_ctl_socket with sock to unify a code with TCP/ICMP.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace tcp_socket with tcp_sock. This is more effective (less
derefferences on fast paths). Additionally, the approach is unified to
one used in ICMP.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anycast DST entries allocated inside ipv6_dev_ac_inc are leaked when
network device is stopped without removing IPv6 addresses from it. The
bug has been observed in the reality on 2.6.18-rhel5 kernel.
In the above case addrconf_ifdown marks all entries as obsolete and
ip6_del_rt called from __ipv6_dev_ac_dec returns ENOENT. The
referrence is not dropped.
The fix is simple. DST entry should not keep referrence when stored in
the FIB6 tree.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the other case it will be destroyed when last address will be removed
from lo inside a namespace. This will break IPv6 in several places. The
most obvious one is ip6_dst_ifdown.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
addrconf_ifdown is broken in respect to the usage of how
parameter. This function is called with (event != NETDEV_DOWN) and (2)
on the IPv6 stop. It the latter case inet6_dev from loopback device
should be destroyed.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ICMP relookup path is only meant to modify behaviour when
appropriate IPsec policies are in place and marked as requiring
relookups. It is certainly not meant to modify behaviour when
IPsec policies don't exist at all.
However, due to an oversight on the error paths existing behaviour
may in fact change should one of the relookup steps fail.
This patch corrects this by redirecting all errors on relookup
failures to the previous code path. That is, if the initial
xfrm_lookup let the packet pass, we will stand by that decision
should the relookup fail due to an error.
This should be safe from a security point-of-view because compliant
systems must install a default deny policy so the packet would'nt
have passed in that case.
Many thanks to Julian Anastasov for pointing out this error.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the Linux the Intra-Site Automatic Tunnel Addressing
Protocol (ISATAP) implementation. It places the ISATAP potential router
list (PRL) in the kernel and adds three new private ioctls for PRL
management.
[Add several changes of structure name, constant names etc. - yoshfuji]
Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits)
[VLAN]: Proc entry is not renamed when vlan device name changes.
[IPV6]: Fix ICMP relookup error path dst leak
[ATM] drivers/atm/iphase.c: compilation warning fix
IPv6: do not create temporary adresses with too short preferred lifetime
IPv6: only update the lifetime of the relevant temporary address
bluetooth : __rfcomm_dlc_close lock fix
bluetooth : use lockdep sub-classes for diffrent bluetooth protocol
[ROSE/AX25] af_rose: rose_release() fix
mac80211: correct use_short_preamble handling
b43: Fix PCMCIA IRQ routing
b43: Add DMA mapping failure messages
mac80211: trigger ieee80211_sta_work after opening interface
[LLC]: skb allocation size for responses
[IP] UDP: Use SEQ_START_TOKEN.
[NET]: Remove Documentation/networking/sk98lin.txt
[ATM] atm/idt77252.c: Make 2 functions static
[ATM]: Make atm/he.c:read_prom_byte() static
[IPV6] MCAST: Ensure to check multicast listener(s).
[LLC]: Kill llc_station_mac_sa symbol export.
forcedeth: fix locking bug with netconsole
...
This may lead to situations, when each of two proc entries produce
data for the other's device.
Looks like a BUG, so this patch is for net-2.6. It will not apply to
net-2.6.26 since dev->nd_net access is replaced with dev_net(dev)
one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we encounter an error while looking up the dst the second
time we need to drop the first dst. This patch is pretty much
the same as the one for IPv4.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
From RFC341:
A temporary address is created only if this calculated Preferred
Lifetime is greater than REGEN_ADVANCE time units. In particular, an
implementation must not create a temporary address with a zero
Preferred Lifetime.
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving a prefix information from a routeur, only update the
lifetimes of the temporary address associated with that prefix.
Otherwise if one deprecated prefix is advertized, all your temporary
addresses will become deprecated.
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
rose_release() doesn't release sockets properly, e.g. it skips
sock_orphan(), so OOPSes are triggered in sock_def_write_space(),
which was observed especially while ROSE skbs were kfreed from
ax25_frames_acked(). There is also sock_hold() and lock_sock() added -
similarly to ax25_release(). Thanks to Bernard Pidoux for substantial
help in debugging this problem.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Reported-and-tested-by: Bernard Pidoux <bpidoux@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows cleaner code when accesing bss->mesh_config components.
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The bug shows up with CONFIG_PREEMPT enabled. Pointed out by Andrew Morton.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
A variable 'i' is being shadowed by another one, but the second
one can just be removed.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Vladimir Koutny <vlado@work.ksp.sk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the IBSS code tries to flush the STA list, it does so in
an atomic context. Flushing isn't safe there, however, and
requires the RTNL, so we need to defer it to a workqueue.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Calling sta_info_destroy() doesn't require RCU-synchronisation
before-hand because it does that internally. However, it does
require rtnl-locking so insert that where necessary.
Also clean up the code doing it internally to be a bit clearer and
not synchronize twice if keys are configured.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When STA structure insertion fails, it has been allocated but isn't
really alive yet, it isn't reachable by any other code and also can't
yet have much configured. This patch changes the code so that when
the insertion fails, the resulting STA pointer is no longer valid
because it is freed.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
sta_info_destroy(NULL) should be valid, but currently isn't because
the argument is dereferenced before the NULL check. There are no
users that currently pass in NULL, i.e. all check before calling the
function, but I want to change that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When joining a new IBSS, all old stations are flushed, but currently
all stations belonging to all virtual interfaces are flushed, which
is wrong. This patch fixes it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This bool causes my gcc-4.1.0 alpha cross compiler to go into an infinite
loop. Switching it to u8 works around that.
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ERP IE bit for preamble mode is 0 for short and 1 for long, not the other
way around. This fixes the value reported to the driver via
bss_conf->use_short_preamble field.
Signed-off-by: Vladimir Koutny <vlado@ksp.sk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ieee80211_sta_work is disabled while network interface
is down. Therefore, if you configure wireless parameters
before bringing the interface up, these configurations are
not yet effective and association fails.
A workaround from userspace is calling a command like
'iwconfig wlan0 ap any' after the interface is brought up.
To fix this behaviour, trigger execution of ieee80211_sta_work from
ieee80211_open when in STA or IBSS mode.
Signed-off-by: Jan Niehusmann <jan@gondor.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allocate the skb for llc responses with the received packet size by
using the size adjustable llc_frame_alloc.
Don't allocate useless extra payload.
Cleanup magic numbers.
So, this fixes oops.
Reported by Jim Westfall:
kernel: skb_over_panic: text:c0541fc7 len:1000 put:997 head:c166ac00 data:c166ac2f tail:0xc166b017 end:0xc166ac80 dev:eth0
kernel: ------------[ cut here ]------------
kernel: kernel BUG at net/core/skbuff.c:95!
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do with the sockstat6 file what we've already done for the sockstat.
Same good side effect - ipv6 reassembling stats are now shown per-net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Besides, now we can see per-net fragments statistics in the
same file, since this stats is already per-net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently they live in init_net only, but now almost all the info
they can provide is available per-net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Such an accounting would cost us two more dereferences to get the
percpu variable from the struct net, so I make sock_prot_inuse_get
and _add calls work differently depending on CONFIG_NET_NS - without
it old optimized routines are used.
The per-cpu counter for init_net is prepared in core_initcall, so
that even af_inet, that starts as fs_initcall, will already have the
init_net prepared.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This counter is about to become per-proto-and-per-net, so we'll need
two arguments to determine which cell in this "table" to work with.
All the places, but proc already pass proper net to it - proc will be
tuned a bit later.
Some indentation with spaces in proc files is done to keep the file
coding style consistent.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's already some stuff on the struct net, that should better
be folded into netns_core structure. I'm making the per-proto inuse
counter be per-net also, which is also a candidate for this, so
introduce this structure and populate it a bit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In ip6_mc_input(), we need to check whether we have listener(s) for
the packet.
After commit ae7bf20a63, all packets
for multicast destinations are delivered to upper layer if
IFF_PROMISC or IFF_ALLMULTI is set.
In fact, bug was rather ancient; the original (before the commit)
intent of the dev->flags check was to skip the ipv6_chk_mcast_addr()
call, assuming L2 filters packets appropriately, but it was even not
true.
Let's explicitly check our multicast list.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace seq_open with seq_open_net and remove udp_seq_release
completely. seq_release_net will do this job just fine.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to create seq_operations for each instance of 'netstat'.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
udp_proc_register/udp_proc_unregister are called with a static pointer only.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An uppercut - do not use the pcounter on struct proto.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Constructive part of the set is finished here. We have to remove the
pcounter, so start with its init and free functions.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And redirect sock_prot_inuse_add and _get to use one.
As far as the dereferences are concerned. Before the patch we made
1 dereference to proto->inuse.add call, the call itself and then
called the __get_cpu_var() on a static variable. After the patch we
make a direct call, then one dereference to proto->inuse_idx and
then the same __get_cpu_var() on a still static variable. So this
patch doesn't seem to produce performance penalty on SMP.
This is not per-net yet, but I will deliberately make NET_NS=y case
separated from NET_NS=n one, since it'll cost us one-or-two more
dereferences to get the struct net and the inuse counter.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inuse counters are going to become a per-cpu array. Introduce an
index for this array on the struct proto.
To handle the case of proto register-unregister-register loop the
bitmap is used. All its bits manipulations are protected with
proto_list_lock and a sanity check for the bitmap being exhausted is
also added.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote:
> they should all be renamed.
Done for include/net and net
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kill unnecessary llc_station_mac_sa.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
discard llc packet which has bogus packet length.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qdisc_run loop is currently unbounded and runs entirely in a
softirq. This is bad as it may create an unbounded softirq run.
This patch fixes this by calling need_resched and breaking out if
necessary.
It also adds a break out if the jiffies value changes since that would
indicate we've been transmitting for too long which starves other
softirqs.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9af3912ec9 ("[NET] Move DF check
to ip_forward") added a new check to send ICMP fragmentation needed
for large packets.
Unlike the check in ip_finish_output(), it doesn't check for GSO.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The older RW_LOCK_UNLOCKED macros defeat lockdep state tracing so
replace them with the newer __RW_LOCK_UNLOCKED macros.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our interest is not the whole entry of proxy neighbor but the
NTF_ROUTER flag. Let's test it explicitly.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Extract hash function for pneigh entries from pneigh_lookup(),
__pneigh_lookup() and pneigh_delete() as pneigh_hash().
Extract core of pneigh_lookup() and __pneigh_lookup() as
__pneigh_lookup_1().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
LLC currently allows users to inject raw frames, including IP packets
encapsulated in SNAP. While Linux doesn't handle IP over SNAP, other
systems do. Restrict LLC sockets to root similar to packet sockets.
[ Modified Patrick's patch to use CAP_NEW_RAW --DaveM ]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With a was number of callsites sctp_add_cmd_sf wrapper bloats
kernel by some amount. Due to unlikely tracking allyesconfig,
with the initial result were around ~7kB (thus caught my
attention) while a non-debug config produced only ~2.3kB effect.
I (ij) proposed first a patch to uninline it but Vlad responded
with a patch that removed the only sctp_add_cmd call which is
wrapped by sctp_add_cmd_sf (I wasn't sure if I could do that).
I did minor cleanup to Vlad's patch.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This elliminates infamous race during module loading when one could lookup
proc entry without proc_fops assigned.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ESP does not account for the IV size when calling pskb_may_pull() to
ensure everything it accesses directly is within the linear part of a
potential fragment. This results in a BUG() being triggered when the
both the IPv4 and IPv6 ESP stack is fed with an skb where the first
fragment ends between the end of the esp header and the end of the IV.
This bug was found by Dirk Nehring <dnehring@gmx.net> .
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reorders some fields in various structures to have
less padding within the structures, making them smaller. It
doesn't yet make any type adjustments, but often size_t is used
for example for IE lengths which is total overkill since size_t
will be 8 bytes long on 64-bit yet the length can at most fill
a u8.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch alters the A-MPDU MLME in sta_info to use dynamic allocation,
thus drastically improving memory usage - from a constant ~2 Kbyte in
the previous (static) allocation to a lower limit of ~200 Byte and an upper
limit of ~2 Kbyte.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes ieee80211_get_channel a static inline defined in
cfg80211's header file which simply calls __ieee80211_get_channel
to avoid symbol clashes with the ieee80211 code.
The problem was pointed out by David Miller, thanks!
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch eliminate the use of buf_size as a trigger in favor of a new
flag to control Rx A-MPDU sessions through debugfs
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Otherwise, 'iwconfig wlan0 key off' with no key set results in:
Error for wireless request "Set Encode" (8B2A) :
SET failed on device wlan0 ; No such file or directory.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make /proc/net/ip6_flowlabel show only flow labels belonging to the
current network namespace.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new member, fl_net, in struct ip6_flowlabel.
This allows to create labels with the same value in different namespaces.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the network namespace information to have this protocol to
handle several network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv6 BEET output function is incorrectly including the inner
header in the payload to be protected. This causes a crash as
the packet doesn't actually have that many bytes for a second
header.
The IPv4 BEET output on the other hand is broken when it comes
to handling an inner IPv6 header since it always assumes an
inner IPv4 header.
This patch fixes both by making sure that neither BEET output
function touches the inner header at all. All access is now
done through the protocol-independent cb structure. Two new
attributes are added to make this work, the IP header length
and the IPv4 option length. They're filled in by the inner
mode's output function.
Thanks to Joakim Koskela for finding this problem.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commits f3db4851 ([NETNS][IPV6] ip6_fib - fib6_clean_all handle several
network namespaces) and 69ddb805 ([NETNS][IPV6] route6 - Make proc entry
/proc/net/rt6_stats per namespace) made some proc files per net.
Both of them introduced potential OOPS - get_proc_net can return NULL, but
this check is lost - and a struct net leak - in case single_open() fails the
previously got net is not put.
Kill all these bugs with one patch.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates an unnecessary poll-related routine
by merging it into TIPC's main polling routine, and updates
the comments associated with this code.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently each vlan_groupd contains 8 pointers on arrays with 512
pointers on struct net_device each :) Such a construction "in many
cases ... wastes memory".
My proposal is to allow for some of these arrays pointers be NULL,
meaning that there are no devices in it. When a new device is added
to the vlan_group, the appropriate array is allocated.
The check in vlan_group_get_device's is safe, since the pointer
vg->vlan_devices_arrays[x] can only switch from NULL to not-NULL.
The vlan_group_prealloc_vid() is guarded with rtnl lock and is
also safe.
I've checked (I hope that) all the places, that use these arrays
and found, that the register_vlan_dev is the only place, that can
put a vlan device on an empty vlan_group.
Rough calculations shows, that after the patch a setup with a
single vlan dev (or up to 512 vlans with sequential vids) will
occupy approximately 8 times less memory.
The question I have is - does this patch makes sense, or a totally
new structures are required to store the vlan_devs?
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly
when the last chunk in the read-list spanned multiple pages. This
resulted in a kernel panic when the wrong context was used to
build the RPC iovec page list.
RDMA_READ is used to fetch RPC data from the client for
NFS_WRITE requests. A scatter-gather is used to map the
advertised client side buffer to the server-side iovec and
associated page list.
WR contexts are used to convey which scatter-gather entries are
handled by each WR. When the write data is large, a single RPC may
require multiple RDMA_READ requests so the contexts for a single RPC
are chained together in a linked list. The last context in this list
is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes,
the CQ handler code can enqueue the RPC for processing.
The code in rdma_read_xdr was setting this bit on the last two
contexts on this list when the last read-list chunk spanned multiple
pages. This caused the svc_rdma_recvfrom logic to incorrectly build
the RPC and caused the kernel to crash because the second-to-last
context doesn't contain the iovec page list.
Modified the condition that sets this bit so that it correctly detects
the last context for the RPC.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tested-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8b7817f3a9 ([IPSEC]: Add ICMP host
relookup support) introduced some dst leaks on error paths: the rt
pointer can be forgotten to be put. Fix it bu going to a proper label.
Found after net namespace's lo refused to unregister :) Many thanks to
Den for valuable help during debugging.
Herbert pointed out, that xfrm_lookup() will put the rtable in case
of error itself, so the first goto fix is redundant.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Given that there are no apparent calls to lock_kernel() or
unlock_kernel() under net/ax25, delete the TODO reference related to
that.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list
method to determine whether it supports multicast. Drivers implementing
secondary unicast support use set_rx_mode however.
Check for both dev->set_multicast_mode and dev->set_rx_mode to determine
multicast capabilities.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This mostly re-uses the net, used in icmp netnsization patches from Denis.
After this ICMP sysctls are completely virtualized.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some flesh to ipv4_sysctl_init_net and ipv4_sysctl_exit_net,
i.e. copy the table, alter .data pointers and register it per-net.
Other ipv4_table's sysctls are now global, but this is going to
change once sysctl permissions patches migrate from -mm tree to
mainline in 2.6.26 merge window :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialization is moved to icmp_sk_init, all the places, that
refer to them use init_net for now.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes adding pernet_operations, empty init and exit
hooks and a bit of changes in sysctl_ipv4_init just not to
have this part in next patches.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It should be a "struct ktermios" not a "struct termios".
Based upon a build warning reported by Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity
or dev_change_flags. Setting it directly causes two unwanted effects:
- the next dev_change_flags call will notice a difference between
dev->gflags and the actual flags, enable promisc/allmulti
mode and incorrectly update dev->gflags
- this keeps the underlying device in promisc/allmulti mode until
the VLAN device is deleted
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimize call routing between NATed endpoints: when an external
registrar sends a media description that contains an existing RTP
expectation from a different SNATed connection, the gatekeeper
is trying to route the call directly between the two endpoints.
We assume both endpoints can reach each other directly and
"un-NAT" the addresses, which makes the media stream go between
the two endpoints directly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for multiple media channels and use it to create
expectations for video streams when present.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SDP connection addresses may be contained in the payload multiple
times (in the session description and/or once per media description),
currently only the session description is properly updated. Split up
SDP mangling so the function setting up expectations only updates the
media port, update connection addresses from media descriptions while
parsing them and at the end update the session description when the
final addresses are known.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create expectations for the RTCP connections in addition to RTP connections.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Media streams can come from anywhere, add a module parameter which
controls whether wildcard expectations or expectations between the
two signalling endpoints are created.
Since the same media description sent on multiple connections may
results in multiple identical expections when using a wildcard source,
we need to check whether a similar expectation already exists for a
different connection before attempting to register it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create expectations for incoming signalling connections when seeing
a REGISTER request. This is needed when the registrar uses a
different source port number for signalling messages and for receiving
incoming calls from other endpoints than the registrar.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SIP message may contain multiple Contact: addresses referring to
the NATed endpoint, translate all of them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update maddr=, received= and rport= Via-header parameters refering to
the signalling connection.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce URI and header parameter parsing helpers. These are needed
by the conntrack helper to parse expiration values in Contact: header
parameters and by the NAT helper to properly update the Via-header
rport=, received= and maddr= parameters.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flush the RTP expectations we've created when a call is hung up or
terminated otherwise.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Perform NAT last after parsing the packet. This makes no difference
currently, but is needed when dealing with registrations to make
sure we seen the unNATed addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for per-method request/response handlers and perform SDP
parsing for INVITE/UPDATE requests and for all informational and
successful responses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the URI parsing helper to get the numerical addresses and get rid of the
text based header translation.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a helper function to parse a SIP-URI in a header value, optionally
iterating through all headers of this kind.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce new function for SIP header parsing that properly deals with
continuation lines and whitespace in headers and use it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The request URI is not a header and needs to be treated differently than
real SIP headers. Add a seperate function for parsing it and get rid of
the POS_REQ_URI/POS_REG_REQ_URI definitions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
SDP and SIP headers are quite different, SIP can have continuation lines,
leading and trailing whitespace after the colon and is mostly case-insensitive
while SDP headers always begin on a new line and are followed by an equal
sign and the value, without any whitespace.
Introduce new SDP header parsing function and convert all users that used
the SIP header parsing function. This will allow to properly deal with the
special SIP cases in the SIP header parsing function later.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace sizeof/memcmp by strlen/strcmp. Use case-insensitive comparison
for SIP methods and the SIP/2.0 string, as specified in RFC 3261.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntrack reference and ctinfo can be derived from the packet.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
After mangling the packet, the pointer to the data and the length of the data
portion may change and need to be adjusted.
Use double data pointers and a pointer to the length everywhere and add a
helper function to the NAT helper for performing the adjustments.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
"limit" marks the first character outside the bounds.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to set up the destination NAT mapping before the source NAT
mapping, so the NAT core gets to see the final tuple and can decide
whether the source port needs to be remapped.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce expectation classes and policies. An expectation class
is used to distinguish different types of expectations by the
same helper (for example audio/video/t.120). The expectation
policy is used to hold the maximum number of expectations and
the initial timeout for each class.
The individual classes are isolated from each other, which means
that for example an audio expectation will only evict other audio
expectations.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is useful for the SIP helper and signalling expectations.
We don't want to create a full-blown expectation with a wildcard
as source based on a single UDP packet, but need to know the
final port anyways. With inactive expectations we can register
the expectation and reserve the tuple, but wait for confirmation
from the registrar before activating it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With nf_conntrack DUMP_TUPLE got renamed to NF_CT_DUMP_TUPLE, fix
CLUSTERIP to use the proper macro name.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Default WMM params have to be set according to beacon/probe response
information prior to authentication (or IBSS start/join); beacon queue
is configured only in IBSS. This does not affect the use of 'real' WMM
params as reported by AP.
Signed-off-by: Vladimir Koutny <vlado@ksp.sk>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Postpone calling ieee80211_hw_config if hardware scanning is active.
This is similar to solution for software scanning where channel setting
is delayed until scan complete.
Signed-off-by: Mohamed Abbas <mohamed.abbas@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds a clean tear down for all block ack sessions if interface
goes down or if a deauthentication is done.
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch also fixes the Rx timer's comments
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes a wrong debug print when receiving delba
Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When you have an AP on channel 13, it will currently often enough
be listed in scan results even when the regulatory domain restricts
to channels 1-11. This is due to channel overlap. To avoid getting
very strange failures, don't show such APs in the scan results. The
failure mode will now go from "I can see the AP but not associate"
to "I can't see the AP although I know it's there" which is easier
to debug.
This problem was first really noticed by Jes Sorensen.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use the new ieee80211_get_channel() function instead of open-coding it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add ieee80211_get_channel() which gets you a channel struct for a
specific wiphy if that channel is present in that wiphy.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 able to send a phase1 key for TKIP
decryption.
This is needed for drivers that don't do the rekeying by themselves
(i.e. iwlwifi). Upon IV16 wrap around, the packet is decrypted in SW,
if decryption is ok, mac80211 calls to update_tkip_key with a new
phase 1 RX key.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 able to compute a TKIP key from an skb.
The requested key can be a phase 1 or a phase 2 key.
This is useful for drivers who need to provide tkip key to their
HW to enable HW encryption.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Introduce an inline net_eq() to compare two namespaces.
Without CONFIG_NET_NS, since no namespace other than &init_net
exists, it is always 1.
We do not need to convert 1) inline vs inline and
2) inline vs &init_net comparisons.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Introduce neigh_parms/pneigh_entry inlines: neigh_parms_net(), pneigh_net().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Without CONFIG_NET_NS, no namespace other than &init_net exists,
no need to store net in seq_net_private.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Last part of hop-limit determination is always:
hoplimit = dst_metric(dst, RTAX_HOPLIMIT);
if (hoplimit < 0)
hoplimit = ipv6_get_hoplimit(dst->dev).
Let's consolidate it as ip6_dst_hoplimit(dst).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Each MIPv6 XFRM state (DSTOPT/RH2) holds either destination or source
address to be mangled in the IPv6 header (that is "CoA").
On Inter-MN communication after both nodes binds each other,
they use route optimized traffic two MIPv6 states applied, and
both source and destination address in the IPv6 header
are replaced by the states respectively.
The packet format is correct, however, next-hop routing search
are not.
This patch fixes it by remembering address pairs for later states.
Based on patch from Masahide NAKAMURA <nakam@linux-ipv6.org>.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Allow to create sockets in the namespace if the protocol ok with this.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP layer now can handle multiple namespaces normally. So, process such
packets normally and drop them only if the transport layer is not
aware about namespaces.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were no packets in the namespace other than initial
previously. This will be changed in the neareast future. Netfilters
are not namespace aware and should be processed in the initial
namespace only for now.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace all the reast of the init_net with a proper net on the socket
layer.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace all the rest of the init_net with a proper net on the IP layer.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options_compile uses inet_addr_type which requires a namespace. The
packet argument is optional, so parameter is the only way to obtain
it. Pass the init_net there for now.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Seqfile operation showing /proc/net/arp are already namespace
aware. All we need is to register this file for each namespace.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get namespace from a device and pass it to the routing engine. Enable
ARP packet processing and device notifiers after that.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This file displays the registered packet types, but some of them
(packet sockets creates such) can be bound to a net device and showing
them in a wrong namespace is not correct.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP-Lite sockets are displayed in another files, rather than
UDP ones, so make the present in namespaces as well.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just introduce a helper to remove ifdefs from inside the
udplite4_register function. This will help to make the next patch
nicer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the commit f40c8174d3 ([NETNS][IPV4]
tcp - make proc handle the network namespaces) it is now possible to make
this file present in newly created namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the commit a91275eff4 ([NETNS][IPV6]
udp - make proc handle the network namespace) it is now possible to make
this file present in newly created namespaces.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Proxy neighbors do not have any reference counting, so any caller
of pneigh_lookup (unless it's a netlink triggered add/del routine)
should _not_ perform any actions on the found proxy entry.
There's one exception from this rule - the ipv6's ndisc_recv_ns()
uses found entry to check the flags for NTF_ROUTER.
This creates a race between the ndisc and pneigh_delete - after
the pneigh is returned to the caller, the nd_tbl.lock is dropped
and the deleting procedure may proceed.
One of the fixes would be to add a reference counting, but this
problem exists for ndisc only. Besides such a patch would be too
big for -rc4.
So I propose to introduce a __pneigh_lookup() which is supposed
to be called with the lock held and use it in ndisc code to check
the flags on alive pneigh entry.
Changes from v2:
As David noticed, Exported the __pneigh_lookup() to ipv6 module.
The checkpatch generates a warning on it, since the EXPORT_SYMBOL
does not follow the symbol itself, but in this file all the
exports come at the end, so I decided no to break this harmony.
Changes from v1:
Fixed comments from YOSHIFUJI - indentation of prototype in header
and the pndisc_check_router() name - and a compilation fix, pointed
by Daniel - the is_routed was (falsely) considered as uninitialized
by gcc.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
sch_htb: fix "too many events" situation
connector: convert to single-threaded workqueue
[ATM]: When proc_create() fails, do some error handling work and return -ENOMEM.
[SUNGEM]: Fix NAPI assertion failure.
BNX2X: prevent ethtool from setting port type
[9P] net/9p/trans_fd.c: remove unused variable
[IPV6] net/ipv6/ndisc.c: remove unused variable
[IPV4] fib_trie: fix warning from rcu_assign_poinger
[TCP]: Let skbs grow over a page on fast peers
[DLCI]: Fix tiny race between module unload and sock_ioctl.
[SCTP]: Fix build warnings with IPV6 disabled.
[IPV4]: Fix null dereference in ip_defrag
The iWARP protocol limits RDMA read requests to a single scatter
entry. NFS/RDMA has code in rdma_read_max_sge() that is supposed to
limit the sge_count for RDMA read requests to 1, but the code to do
that is inside an #ifdef RDMA_TRANSPORT_IWARP block. In the mainline
kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a
preprocessor #define, so the #ifdef'ed code is never compiled.
In my test of a kernel build with -j8 on an NFS/RDMA mount, this
problem eventually leads to trouble starting with:
svcrdma: Error posting send = -22
svcrdma : RDMA_READ error = -22
and things go downhill from there.
The trivial fix is to delete the #ifdef guard. The check seems to be
a remnant of when the NFS/RDMA code was not merged and needed to
compile against multiple kernel versions, although I don't think it
ever worked as intended. In any case now that the code is upstream
there's no need to test whether the RDMA_TRANSPORT_IWARP constant is
defined or not.
Without this patch, my kernel build on an NFS/RDMA mount using NetEffect
adapters quickly and 100% reproducibly failed with an error like:
ld: final link failed: Software caused connection abort
With the patch applied I was able to complete a kernel build on the
same setup.
(Tom Tucker says this is "actually an _ancient_ remnant when it had to
compile against iWARP vs. non-iWARP enabled OFA trees.")
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sctp_datamsg_free and sctp_datamsg_track are just aliases for
sctp_datamsg_put and sctp_chunk_hold, respectively.
Saves 32 Bytes on x86.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make /proc/net/fib_trie and /proc/net/fib_triestat display
all routing tables, not just local and main.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the first u32 copied from syncookie_secret is overwritten by the
minute-counter four lines below. After adjusting the destination
address, the size of syncookie_secret can be reduced accordingly.
AFAICS, the only other user of syncookie_secret[] is the ipv6
syncookie support. Because ipv6 syncookies only grab 44 bytes from
syncookie_secret[], this shouldn't affect them in any way.
With fixes from Glenn Griffin.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HTB is event driven algorithm and part of its work is to apply
scheduled events at proper times. It tried to defend itself from
livelock by processing only limited number of events per dequeue.
Because of faster computers some users already hit this hardcoded
limit.
This patch limits processing up to 2 jiffies (why not 1 jiffie ?
because it might stop prematurely when only fraction of jiffie
remains).
Signed-off-by: Martin Devera <devik@cdi.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patches removes unused code in ndisc_send_redirect() method in
net/ipv6/ndisc.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable cb is initialized but never used otherwise.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
type T;
identifier i;
constant C;
@@
(
extern T i;
|
- T i;
<+... when != i
- i = C;
...+>
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable hlen is initialized but never used otherwise.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
type T;
identifier i;
constant C;
@@
(
extern T i;
|
- T i;
<+... when != i
- i = C;
...+>
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This gets rid of a warning caused by the test in rcu_assign_pointer.
I tried to fix rcu_assign_pointer, but that devolved into a long set
of discussions about doing it right that came to no real solution.
Since the test in rcu_assign_pointer for constant NULL would never
succeed in fib_trie, just open code instead.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The route table parameters are set based on system memory and sysctl
values that almost never change. Also the genid only changes every
10 minutes.
RTprint is defined by never used.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sorry for the patch sequence confusion :| but I found that the similar
thing can be done for raw sockets easily too late.
Expand the proto.h union with the raw_hashinfo member and use it in
raw_prot and rawv6_prot. This allows to drop the protocol specific
versions of hash and unhash callbacks.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After this we have only udp_lib_get_port to get the port and two
stubs for ipv4 and ipv6. No difference in udp and udplite except
for initialized h.udp_hash member.
I tried to find a graceful way to drop the only difference between
udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison
routine), but adding one more callback on the struct proto didn't
appear such :( Maybe later.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inspired by the commit ab1e0a13 ([SOCK] proto: Add hashinfo member to
struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4
and -v6 protocols.
The result is not that exciting, but it removes some levels of
indirection in udpxxx_get_port and saves some space in code and text.
The first step is to union existing hashinfo and new udp_hash on the
struct proto and give a name to this union, since future initialization
of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc
warning about inability to initialize anonymous member this way.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes code a bit more uniform and straigthforward.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_options->is_data is assigned only and never checked. The structure is
not a part of kernel interface to the userspace. So, it is safe to remove
this field.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is the only way to reach ip_options compile with opt != NULL:
ip_options_get_finish
opt->is_data = 1;
ip_options_compile(opt, NULL)
So, checking for is_data inside opt != NULL branch is not needed.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>