linux

Author	SHA1	Message	Date
Williams, Mitch A	ebc08a6f47	rtnetlink: Add VF config code to rtnetlink Add code to allow rtnetlink clients to query and set VF information through the PF driver. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-12 16:56:08 -08:00
Alexey Dobriyan	2c8c1e7297	net: spread __net_init, __net_exit __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-17 19:16:02 -08:00
Eric W. Biederman	d90a909e1f	net: Fix userspace RTM_NEWLINK notifications. I received some bug reports about userspace programs having problems because after RTM_NEWLINK was received they could not immediate access files under /proc/sys/net/ because they had not been registered yet. The original problem was trivially fixed by moving the userspace notification from rtnetlink_event() to the end of register_netdevice(). When testing that change I discovered I was still getting RTM_NEWLINK events before I could access proc and I was also getting RTM_NEWLINK events after I was seeing RTM_DELLINK. Things practically guaranteed to confuse userspace. After a little more investigation these extra notifications proved to be from the new notifiers NETDEV_POST_INIT and NETDEV_UNREGISTER_BATCH hitting the default case in rtnetlink_event, and triggering unnecessary RTM_NEWLINK messages. rtnetlink_event now explicitly handles NETDEV_UNREGISTER_BATCH and NETDEV_POST_INIT to avoid sending the incorrect userspace notifications. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-12-13 19:45:22 -08:00
Eric W. Biederman	81adee47df	net: Support specifying the network namespace upon device creation. There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>	2009-11-08 00:53:51 -08:00
Eric Dumazet	e0d087af72	rtnetlink: Cleanups Pure cleanups patch Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-07 01:26:17 -08:00
Eric Dumazet	23289a37e2	net: add a list_head parameter to dellink() method Adding a list_head parameter to rtnl_link_ops->dellink() methods allow us to queue devices on a list, in order to dismantle them all at once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-28 02:22:07 -07:00
Eric Dumazet	7c28bd0b8e	rtnetlink: speedup rtnl_dump_ifinfo() When handling large number of netdevice, rtnl_dump_ifinfo() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the 256 sub lists of the dev_index hash table. This considerably speedups "ip link" operations Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-24 06:13:17 -07:00
Eric Dumazet	a3d1289126	rtnetlink: rtnl_setlink() and rtnl_getlink() changes rtnl_getlink() & rtnl_setlink() run with RTNL held, we can use __dev_get_by_index() and __dev_get_by_name() variants and avoid dev_hold()/dev_put() Adds to rtnl_getlink() the capability to find a device by its name, not only by its index. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-22 04:33:48 -07:00
Patrick McHardy	af356afa01	net_sched: reintroduce dev->qdisc for use by sch_api Currently the multiqueue integration with the qdisc API suffers from a few problems: - with multiple queues, all root qdiscs use the same handle. This means they can't be exposed to userspace in a backwards compatible fashion. - all API operations always refer to queue number 0. Newly created qdiscs are automatically shared between all queues, its not possible to address individual queues or restore multiqueue behaviour once a shared qdisc has been attached. - Dumps only contain the root qdisc of queue 0, in case of non-shared qdiscs this means the statistics are incomplete. This patch reintroduces dev->qdisc, which points to the (single) root qdisc from userspace's point of view. Currently it either points to the first (non-shared) default qdisc, or a qdisc shared between all queues. The following patches will introduce a classful dummy qdisc, which will be used as root qdisc and contain the per-queue qdiscs as children. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-06 02:07:03 -07:00
Eric Dumazet	2e59af3dcb	vlan: multiqueue vlan device vlan devices are currently not multi-queue capable. We can do that with a new rtnl_link_ops method, get_tx_queues(), called from rtnl_create_link() This new method gets num_tx_queues/real_num_tx_queues from real device. register_vlan_device() is also handled. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-02 18:03:00 -07:00
Johannes Berg	30ffee8480	net: move and export get_net_ns_by_pid The function get_net_ns_by_pid(), to get a network namespace from a pid_t, will be required in cfg80211 as well. Therefore, let's move it to net_namespace.c and export it. We can't make it a static inline in the !NETNS case because it needs to verify that the given pid even exists (and return -ESRCH). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-12 14:03:28 -07:00
Pablo Neira Ayuso	1ce85fe402	netlink: change nlmsg_notify() return value logic This patch changes the return value of nlmsg_notify() as follows: If NETLINK_BROADCAST_ERROR is set by any of the listeners and an error in the delivery happened, return the broadcast error; else if there are no listeners apart from the socket that requested a change with the echo flag, return the result of the unicast notification. Thus, with this patch, the unicast notification is handled in the same way of a broadcast listener that has set the NETLINK_BROADCAST_ERROR socket flag. This patch is useful in case that the caller of nlmsg_notify() wants to know the result of the delivery of a netlink notification (including the broadcast delivery) and take any action in case that the delivery failed. For example, ctnetlink can drop packets if the event delivery failed to provide reliable logging and state-synchronization at the cost of dropping packets. This patch also modifies the rtnetlink code to ignore the return value of rtnl_notify() in all callers. The function rtnl_notify() (before this patch) returned the error of the unicast notification which makes rtnl_set_sk_err() reports errors to all listeners. This is not of any help since the origin of the change (the socket that requested the echoing) notices the ENOBUFS error if the notification fails and should resync itself. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-24 23:18:28 -08:00
Stephen Hemminger	eeda3fd64f	netdev: introduce dev_get_stats() In order for the network device ops get_stats call to be immutable, the handling of the default internal network device stats block has to be changed. Add a new helper function which replaces the old use of internal_get_stats. Note: change return code to make it clear that the caller should not go changing the returned statistics. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-19 21:40:23 -08:00
Stephen Hemminger	d314774cf2	netdev: network device operations infrastructure This patch changes the network device internal API to move adminstrative operations out of the network device structure and into a separate structure. This patch involves some hackery to maintain compatablity between the new and old model, so all 300+ drivers don't have to be changed at once. For drivers that aren't converted yet, the netdevice_ops virt function list still resides in the net_device structure. For old protocols, the new net_device_ops are copied out to the old net_device pointers. After the transistion is completed the nag message can be changed to an WARN_ON, and the compatiablity code can be made configurable. Some function pointers aren't moved: * destructor can't be in net_device_ops because it may need to be referenced after the module is unloaded. * neighbor setup is manipulated in a couple of places that need special consideration * hard_start_xmit is in the fast path for transmit. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-19 21:32:24 -08:00
Johannes Berg	5f9021cfdc	rtnetlink: propagate error from dev_change_flags in do_setlink() Unlike ifconfig, iproute doesn't report an error when setting an interface up fails: (example: put wireless network mac80211 interface into repeater mode with iwconfig but do not set a peer MAC address, it should fail with -ENOLINK) without patch: # ip link set wlan0 up ; echo $? 0 # with patch: # ip link set wlan0 up ; echo $? RTNETLINK answers: Link has been severed 2 # Propagate the return value from dev_change_flags() to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-16 23:20:31 -08:00
Johannes Berg	95a5afca4a	net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) Some code here depends on CONFIG_KMOD to not try to load protocol modules or similar, replace by CONFIG_MODULES where more than just request_module depends on CONFIG_KMOD and and also use try_then_request_module in ebtables. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-16 15:24:51 -07:00
David S. Miller	4dd565134e	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/ich8lan.c drivers/net/e1000e/netdev.c	2008-10-08 14:56:41 -07:00
Herbert Xu	58ec3b4db9	net: Fix netdev_run_todo dead-lock Benjamin Thery tracked down a bug that explains many instances of the error unregister_netdevice: waiting for %s to become free. Usage count = %d It turns out that netdev_run_todo can dead-lock with itself if a second instance of it is run in a thread that will then free a reference to the device waited on by the first instance. The problem is really quite silly. We were trying to create parallelism where none was required. As netdev_run_todo always follows a RTNL section, and that todo tasks can only be added with the RTNL held, by definition you should only need to wait for the very ones that you've added and be done with it. There is no need for a second mutex or spinlock. This is exactly what the following patch does. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 15:50:03 -07:00
Stephen Hemminger	0b815a1a6d	net: network device name ifalias support This patch add support for keeping an additional character alias associated with an network interface. This is useful for maintaining the SNMP ifAlias value which is a user defined value. Routers use this to hold information like which circuit or line it is connected to. It is just an arbitrary text label on the network device. There are two exposed interfaces with this patch, the value can be read/written either via netlink or sysfs. This could be maintained just by the snmp daemon, but it is more generally useful for other management tools, and the kernel is good place to act as an agreed upon interface to store it. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 21:28:11 -07:00
David S. Miller	e8a0464cc9	netdev: Allocate multiple queues for TX. alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:00 -07:00
David S. Miller	b0e1e6462d	netdev: Move rest of qdisc state into struct netdev_queue Now qdisc, qdisc_sleeping, and qdisc_list also live there. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:42:10 -07:00
David S. Miller	65b53e4cc9	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tg3.c drivers/net/wireless/rt2x00/rt2x00dev.c net/mac80211/ieee80211_i.h	2008-06-10 02:22:26 -07:00
Thomas Graf	bc3ed28caa	netlink: Improve returned error codes Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and nla_nest_cancel() void functions. Return -EMSGSIZE instead of -1 if the provided message buffer is not big enough. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-03 16:36:54 -07:00
Pavel Emelyanov	96e74088f1	net: The dev->get_stats pointer is not NULL nowadays. And so does the pointer is returns, but sysfs and netlinks still check for both cases. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 14:12:46 -07:00
Patrick McHardy	c9c1014b2b	[RTNETLINK]: Fix bogus ASSERT_RTNL warning ASSERT_RTNL uses mutex_trylock to test whether the rtnl_mutex is held. This bogus warnings when running in atomic context, which f.e. happens when adding secondary unicast addresses through macvlan or vlan or when synchronizing multicast addresses from wireless devices. Mid-term we might want to consider moving all address updates to process context since the locking seems overly complicated, for now just fix the bogus warning by changing ASSERT_RTNL to use mutex_is_locked(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-23 22:10:48 -07:00
Pavel Emelyanov	669f87baab	[RTNL]: Introduce the rtnl_kill_links helper. This one is responsible for calling ->dellink on each net device found in net to help with vlan net_exit hook in the nearest future. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 00:46:52 -07:00
Pavel Emelyanov	3a931a80cb	[RTNL]: Relax for_each_netdev_safe in __rtnl_link_unregister. Each potential list_del (happening from inside a ->dellink call) is followed by goto restart, so there's no need in _safe iteration. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 00:45:56 -07:00
YOSHIFUJI Hideaki	3b1e0a655f	[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:55 +09:00
YOSHIFUJI Hideaki	c346dca108	[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:53 +09:00
Thomas Graf	1840bb13c2	[RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK RTM_NEWLINK allows for already existing links to be modified. For this purpose do_setlink() is called which expects address attributes with a payload length of at least dev->addr_len. This patch adds the necessary validation for the RTM_NEWLINK case. The address length for links to be created is not checked for now as the actual attribute length is used when copying the address to the netdevice structure. It might make sense to report an error if less than addr_len bytes are provided but enforcing this might break drivers trying to be smart with not transmitting all zero addresses. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-23 19:54:36 -08:00
Thomas Graf	76e87306c2	[RTNL]: Add missing link netlink attribute policy definitions IFLA_LINK is no longer a write-only attribute on the kernel side and must thus be validated. Same goes for the newly introduced IFLA_LINKINFO. Fixes undefined behaviour if either of the attributes are not well formed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-19 16:12:08 -08:00
David S. Miller	93b2d4a208	Revert "[RTNETLINK]: Send a single notification on device state changes." This reverts commit `45b5035482`. It break locking around dev->link_mode as well as cause other bootup problems. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-17 18:35:07 -08:00
Laszlo Attila Toth	45b5035482	[RTNETLINK]: Send a single notification on device state changes. In do_setlink() a single notification is sent at the end of the function if any modification occured. If the address has been changed, another notification is sent. Both of them is required because originally only the NETDEV_CHANGEADDR notification was sent and although device state change implies address change, some programs may expect the original notification. It remains for compatibity. If set_operstate() is called from do_setlink(), it doesn't send a notification, only if it is called from rtnl_create_link() as earlier. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-12 22:42:09 -08:00
Adrian Bunk	03245ce2f0	[NET] rtnetlink.c: remove no longer used functions This patch removes the following no longer used functions: - rtattr_parse() - rtattr_strlcpy() - __rtattr_parse_nested_compat() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 03:17:22 -08:00
Denis V. Lunev	775516bfa2	[NETNS]: Namespace stop vs 'ip r l' race. During network namespace stop process kernel side netlink sockets belonging to a namespace should be closed. They should not prevent namespace to stop, so they do not increment namespace usage counter. Though this counter will be put during last sock_put. The raplacement of the correct netns for init_ns solves the problem only partial as socket to be stoped until proper stop is a valid netlink kernel socket and can be looked up by the user processes. This is not a problem until it resides in initial namespace (no processes inside this net), but this is not true for init_net. So, hold the referrence for a socket, remove it from lookup tables and only after that change namespace and perform a last put. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:08 -08:00
Denis V. Lunev	b7c6ba6eb1	[NETNS]: Consolidate kernel netlink socket destruction. Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:07 -08:00
Denis V. Lunev	4f84d82f7a	[NETNS]: Memory leak on network namespace stop. Network namespace allocates 2 kernel netlink sockets, fibnl & rtnl. These sockets should be disposed properly, i.e. by sock_release. Plain sock_put is not enough. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:06 -08:00
Eric W. Biederman	4b3da706bb	[NET]: Make the netlink methods in rtnetlink handle multiple network namespaces After the previous prep work this just consists of removing checks limiting the code to work in the initial network namespace, and updating rtmsg_ifinfo so we can generate events for devices in something other then the initial network namespace. Referring to network other network devices like the IFLA_LINK and IFLA_MASTER attributes do, gets interesting if those network devices happen to be in other network namespaces. Currently ifindex numbers are allocated globally so I have taken the path of least resistance and not still report the information even though the devices they are talking about are invisible. If applications start getting confused or when ifindex numbers become local to the network namespace we may need to do something different in the future. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Denis V. Lunev <den@openz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:26 -08:00
Denis V. Lunev	97c53cacf0	[NET]: Make rtnetlink infrastructure network namespace aware (v3) After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:25 -08:00
Denis V. Lunev	b854272b3c	[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2) Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2008-01-28 14:54:24 -08:00
Patrick McHardy	68365458a4	[NET]: rtnl_link: fix use-after-free When unregistering the rtnl_link_ops, all existing devices using the ops are destroyed. With nested devices this may lead to a use-after-free despite the use of for_each_netdev_safe() in case the upper device is next in the device list and is destroyed by the NETDEV_UNREGISTER notifier. The easy fix is to restart scanning the device list after removing a device. Alternatively we could add new devices to the front of the list to avoid having dependant devices follow the device they depend on. A third option would be to only restart scanning if dev->iflink of the next device matches dev->ifindex of the current one. For now this seems like the safest solution. With this patch, the veth rtnl_link_ops unregistration can use rtnl_link_unregister() directly since it now also handles destruction of multiple devices at once. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-20 20:31:45 -08:00
Eric W. Biederman	ceaa79c434	[NETNS]: Fix get_net_ns_by_pid The pid namespace patches changed the semantics of find_task_by_pid without breaking the compile resulting in get_net_ns_by_pid doing the wrong thing. So switch to using the intended find_task_by_vpid. Combined with Denis' earlier patch to make netlink traffic fully synchronous the inadvertent race I introduced with accessing current is actually removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 22:56:12 -07:00
Pavel Emelyanov	cf7b708c8d	Make access to task's nsproxy lighter When someone wants to deal with some other taks's namespaces it has to lock the task and then to get the desired namespace if the one exists. This is slow on read-only paths and may be impossible in some cases. E.g. Oleg recently noticed a race between unshare() and the (sent for review in cgroups) pid namespaces - when the task notifies the parent it has to know the parent's namespace, but taking the task_lock() is impossible there - the code is under write locked tasklist lock. On the other hand switching the namespace on task (daemonize) and releasing the namespace (after the last task exit) is rather rare operation and we can sacrifice its speed to solve the issues above. The access to other task namespaces is proposed to be performed like this: rcu_read_lock(); nsproxy = task_nsproxy(tsk); if (nsproxy != NULL) { / * * work with the namespaces here * e.g. get the reference on one of them * / } / * * NULL task_nsproxy() means that this task is * almost dead (zombie) * / rcu_read_unlock(); This patch has passed the review by Eric and Oleg :) and, of course, tested. [clg@fr.ibm.com: fix unshare()] [ebiederm@xmission.com: Update get_net_ns_by_pid] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Denis V. Lunev	cd40b7d398	[NET]: make netlink user -> kernel interface synchronious This patch make processing netlink user -> kernel messages synchronious. This change was inspired by the talk with Alexey Kuznetsov about current netlink messages processing. He says that he was badly wrong when introduced asynchronious user -> kernel communication. The call netlink_unicast is the only path to send message to the kernel netlink socket. But, unfortunately, it is also used to send data to the user. Before this change the user message has been attached to the socket queue and sk->sk_data_ready was called. The process has been blocked until all pending messages were processed. The bad thing is that this processing may occur in the arbitrary process context. This patch changes nlk->data_ready callback to get 1 skb and force packet processing right in the netlink_unicast. Kernel -> user path in netlink_unicast remains untouched. EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock drop, but the process remains in the cycle until the message will be fully processed. So, there is no need to use this kludges now. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:15:29 -07:00
Denis V. Lunev	1536cc0d55	[NET]: rtnl_unlock cleanups There is no need to process outstanding netlink user->kernel packets during rtnl_unlock now. There is no rtnl_trylock in the rtnetlink_rcv anymore. Normal code path is the following: netlink_sendmsg netlink_unicast netlink_sendskb skb_queue_tail netlink_data_ready rtnetlink_rcv mutex_lock(&rtnl_mutex); netlink_run_queue(sk, qlen, &rtnetlink_rcv_msg); mutex_unlock(&rtnl_mutex); So, it is possible, that packets can be present in the rtnl->sk_receive_queue during rtnl_unlock, but there is no need to process them at that moment as rtnetlink_rcv for that packet is pending. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:12:58 -07:00
Herbert Xu	0cfad07555	[NETLINK]: Avoid pointer in netlink_run_queue I was looking at Patrick's fix to inet_diag and it occured to me that we're using a pointer argument to return values unnecessarily in netlink_run_queue. Changing it to return the value will allow the compiler to generate better code since the value won't have to be memory-backed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:24 -07:00
Eric W. Biederman	d8a5ec6727	[NET]: netlink support for moving devices between network namespaces. The simplest thing to implement is moving network devices between namespaces. However with the same attribute IFLA_NET_NS_PID we can easily implement creating devices in the destination network namespace as well. However that is a little bit trickier so this patch sticks to what is simple and easy. A pid is used to identify a process that happens to be a member of the network namespace we want to move the network device to. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:13 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
Eric W. Biederman	b4b510290b	[NET]: Support multiple network namespaces with netlink Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Eric W. Biederman	e9dc865340	[NET]: Make device event notification network namespace safe Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Pavel Emelianov	e71992889e	[RTNETLINK]: Introduce generic rtnl_create_link(). This routine gets the parsed rtnl attributes and creates a new link with generic info (IFLA_LINKINFO policy). Its intention is to help the drivers, that need to create several links at once (like VETH). This is nothing but a copy-paste-ed part of rtnl_newlink() function that is responsible for creation of new device. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:45 -07:00
Stephen Hemminger	bea3348eef	[NET]: Make NAPI polling independent of struct net_device objects. Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device dev, int budget) to int foo_poll(struct napi_struct napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:45 -07:00
Thomas Graf	8072f085d7	[RTNETLINK]: Fix warning for !CONFIG_KMOD replay label is unused otherwise. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 14:13:50 -07:00
YOSHIFUJI Hideaki	40b77c9434	[NET] CORE: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-07-19 10:43:23 +09:00
Patrick McHardy	0e06877c6f	[RTNETLINK]: rtnl_link: allow specifying initial device address Drivers need to validate the initial addresses in their netlink attribute validation function or manually reject them if they can't support this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-11 19:45:36 -07:00
Patrick McHardy	2d85cba2b2	[RTNETLINK]: rtnl_link API simplification All drivers need to unregister their devices in the module unload function. While doing so they must hold the rtnl and atomically unregister the rtnl_link ops as well. This makes the rtnl_link_unregister function that takes the rtnl itself completely useless. Provide default newlink/dellink functions, make __rtnl_link_unregister and rtnl_link_unregister unregister all devices with matching rtnl_link_ops and change the existing users to take advantage of that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-11 19:45:33 -07:00
Patrick McHardy	2371baa4bd	[RTNETLINK]: Fix rtnetlink compat attribute patch Sent the wrong patch previously. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:40 -07:00
Patrick McHardy	afdc3238ec	[RTNETLINK]: Add nested compat attribute Add a nested compat attribute type that can be used to convert attributes that contain a structure to nested attributes in a backwards compatible way. The attribute looks like this: struct { [ compat contents ] struct rtattr { .rta_len = total size, .rta_type = type, } rta; struct old_structure struct; [ nested top-level attribute ] struct rtattr { .rta_len = nest size, .rta_type = type, } nest_attr; [ optional 0 .. n nested attributes ] struct rtattr { .rta_len = private attribute len, .rta_type = private attribute typ, } nested_attr; struct nested_data data; }; Since both userspace and kernel deal correctly with attributes that are larger than expected old versions will just parse the compat part and ignore the rest. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:39 -07:00
Patrick McHardy	38f7b870d4	[RTNETLINK]: Link creation API Add rtnetlink API for creating, changing and deleting software devices. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:14:20 -07:00
Patrick McHardy	0157f60c0c	[RTNETLINK]: Split up rtnl_setlink Split up rtnl_setlink into a function performing validation and a function performing the actual changes. This allows to share the modifcation logic with rtnl_newlink, which is introduced by the next patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:14:16 -07:00
Patrick McHardy	51055be81c	[RTNETLINK]: ifindex 0 does not exist ifindex == 0 does not exist and implies we should do a lookup by name if one was given. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:11 -07:00
Patrick McHardy	ef7c79ed64	[NETLINK]: Mark netlink policies const Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:10 -07:00
Patrick McHardy	575c3e2a04	[RTNETLINK]: Remove remains of wireless extensions over rtnetlink Remove some unused variables and function arguments related to the recently removed wireless extensions over rtnetlink. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-22 17:00:49 -07:00
Patrick McHardy	83b496e928	[RTNETLINK]: Allow changing of subsets of netdevice flags in rtnl_setlink rtnl_setlink doesn't allow to change subsets of the flags, just to override the set entirely by a new one. This means that for simply setting a device up or down userspace first needs to query the current flags, change it and send the changed flags back, which is racy and needlessly complicated. Mask the flags using ifi_change since this is what it is intended for. For backwards compatibility treat ifi_change == 0 as ~0 (even though it seems quite unlikely that anyone has been using this so far). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-22 17:00:01 -07:00
Pavel Emelianov	7562f876cd	[NET]: Rework dev_base via list_head (v3) Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 15:13:45 -07:00
Adrian Bunk	42bad1da50	[NETLINK]: Possible cleanups. - make the following needlessly global variables static: - core/rtnetlink.c: struct rtnl_msg_handlers[] - netfilter/nf_conntrack_proto.c: struct nf_ct_protos[] - make the following needlessly global functions static: - core/rtnetlink.c: rtnl_dump_all() - netlink/af_netlink.c: netlink_queue_skip() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 00:57:41 -07:00
Johannes Berg	9e101eab15	[WIRELESS]: Remove wext over netlink. As scheduled, this patch removes the pointless wext over netlink code. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:42 -07:00
Stephen Hemminger	3ff50b7997	[NET]: cleanup extra semicolons Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:24 -07:00
Patrick McHardy	6313c1e099	[RTNETLINK]: Remove unnecessary locking in dump callbacks Since we're now holding the rtnl during the entire dump operation, we can remove additional locking for rtnl protected data. This patch does that for all simple cases (dev_base_lock for dev_base walking, RCU protection for FIB rule dumping). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:05 -07:00
Patrick McHardy	1c2d670f36	[RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks Hold rtnl_mutex during the entire netlink dump operation. This allows to simplify locking in the dump callbacks, since they can now rely on that no concurrent changes happen. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:04 -07:00
Patrick McHardy	af65bdfce9	[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:03 -07:00
Thomas Graf	038890fed8	[RTNL]: Improve error codes for unsupported operations The most common trigger of these errors is that the config option hasn't been enable wich would make the functionality available. Therefore returning EOPNOTSUPP gives a better idea on what is going wrong. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:34 -07:00
Thomas Graf	c702e8047f	[NETLINK]: Directly return -EINTR from netlink_dump_start() Now that all users of netlink_dump_start() use netlink_run_queue() to process the receive queue, it is possible to return -EINTR from netlink_dump_start() directly, therefore simplying the callers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:33 -07:00
Thomas Graf	1d00a4eb42	[NETLINK]: Remove error pointer from netlink message handler The error pointer argument in netlink message handlers is used to signal the special case where processing has to be interrupted because a dump was started but no error happened. Instead it is simpler and more clear to return -EINTR and have netlink_run_queue() deal with getting the queue right. nfnetlink passed on this error pointer to its subsystem handlers but only uses it to signal the start of a netlink dump. Therefore it can be removed there as well. This patch also cleans up the error handling in the affected message handlers to be consistent since it had to be touched anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:30 -07:00
Thomas Graf	45e7ae7f71	[NETLINK]: Ignore control messages directly in netlink_run_queue() Changes netlink_rcv_skb() to skip netlink controll messages and don't pass them on to the message handler. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:29 -07:00
Thomas Graf	d35b685640	[NETLINK]: Ignore !NLM_F_REQUEST messages directly in netlink_run_queue() netlink_rcv_skb() is changed to skip messages which don't have the NLM_F_REQUEST bit to avoid every netlink family having to perform this check on their own. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:29 -07:00
Thomas Graf	51057f2fec	[RTNL]: Properly return rntl message handler Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:24 -07:00
Thomas Graf	687ad8cc64	[RTNL]: Use rtnl registration interface for dump-all aliases Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:15 -07:00
Thomas Graf	9d9e6a5819	[NET] rules: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:07 -07:00
Thomas Graf	c8822a4e00	[NEIGH]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:06 -07:00
Thomas Graf	340d17fc9d	[NET] link: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:05 -07:00
Thomas Graf	e284986385	[RTNL]: Message handler registration interface This patch adds a new interface to register rtnetlink message handlers replacing the exported rtnl_links[] array which required many message handlers to be exported unnecessarly. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:04 -07:00
Jean Tourrilhes	c2805fbb86	[PATCH] WE-22 : prevent information leak on 64 bit Johannes Berg discovered that kernel space was leaking to userspace on 64 bit platform. He made a first patch to fix that. This is an improved version of his patch. Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-03-27 14:10:26 -04:00
Tim Schmielau	cd354f1ae7	[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
YOSHIFUJI Hideaki	4ec93edb14	[NET] CORE: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:19:25 -08:00
Patrick McHardy	26932566a4	[NETLINK]: Don't BUG on undersized allocations Currently netlink users BUG when the allocated skb for an event notification is undersized. While this is certainly a kernel bug, its not critical and crashing the kernel is too drastic, especially when considering that these errors have appeared multiple times in the past and it BUGs even if no listeners are present. This patch replaces BUG by WARN_ON and changes the notification functions to inform potential listeners of undersized allocations using a unique error code (EMSGSIZE). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:41 -08:00
Thomas Graf	e3703b3de1	[RTNETLINK]: Add rtnl_put_cacheinfo() to unify some code IPv4, IPv6, and DECNet all use struct rta_cacheinfo in a similiar way, therefore rtnl_put_cacheinfo() is added to reuse code. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:44 -08:00
Thomas Graf	6051e2f4fb	[IPv6] prefix: Convert RTM_NEWPREFIX notifications to use the new netlink api RTM_GETPREFIX is completely unused and is thus removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:45 -08:00
Thomas Graf	339bf98ffc	[NETLINK]: Do precise netlink message allocations where possible Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:11 -08:00
Patrick McHardy	b974179abe	[RTNETLINK]: Fix use of wrong skb in do_getlink() skb is the netlink query, nskb is the reply message. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-12 01:50:30 -07:00
Eric Sesterhenn	9918f23096	[RTNETLINK]: Possible dereference in net/core/rtnetlink.c another possible dereference spotted by coverity (#cid 1390). if the nlmsg_parse() call fails, we goto errout, where we call dev_put(), with dev still initialized to NULL. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:01:23 -07:00
Patrick McHardy	78e5b8916e	[RTNETLINK]: Fix netdevice name corruption When changing a device by ifindex without including a IFLA_IFNAME attribute, the ifname variable contains random garbage and is used to change the device name. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:50 -07:00
Thomas Graf	3015d5d4e5	[RTNETLINK]: Fix typo causing wrong skb to be freed A typo introduced by myself which leads to freeing the skb containing the netlink message when it should free the newly allocated skb for the reply. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:43 -07:00
Thomas Graf	5176f91ea8	[NETLINK]: Make use of NLA_STRING/NLA_NUL_STRING attribute validation Converts existing NLA_STRING attributes to use the new validation features, saving a couple of temporary buffers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:25 -07:00
David S. Miller	a57d27fc71	[RTNETLINK]: Don't return error on no-metrics. Instead just cancel the nested attribute and return 0. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:40 -07:00
Thomas Graf	2d7202bfdd	[IPv6] route: Convert FIB6 dumping to use new netlink api Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:13 -07:00
Thomas Graf	56fc85ac96	[RTNETLINK]: Unexport rtnl socket Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:02 -07:00
Thomas Graf	0ec6d3f467	[NET] link: Convert notifications to use rtnl_notify() Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:01 -07:00
Thomas Graf	97676b6b55	[RTNETLINK]: Add rtnetlink notification interface Adds rtnl_notify() to send rtnetlink notification messages and rtnl_set_sk_err() to report notification errors as socket errors in order to indicate the need of a resync due to loss of events. nlmsg_report() is added to properly document the meaning of NLM_F_ECHO. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:50 -07:00
Thomas Graf	2942e90050	[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:48 -07:00
Thomas Graf	b63bbc5006	[NEIGH]: Move netlink neighbour table bits to linux/neighbour.h rtnetlink_rcv_msg() is not longer required to parse attributes for the neighbour tables layer, remove dependency on obsolete and buggy rta_buf. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:04 -07:00
Thomas Graf	9067c722cf	[NEIGH]: Move netlink neighbour bits to linux/neighbour.h Moves netlink neighbour bits to linux/neighbour.h. Also moves bits to be exported to userspace from net/neighbour.h to linux/neighbour.h and removes __KERNEL__ guards, userspace is not supposed to be using it. rtnetlink_rcv_msg() is not longer required to parse attributes for the neighbour layer, remove dependency on obsolete and buggy rta_buf. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:01 -07:00
Thomas Graf	b60c5115f4	[NET]: Convert link dumping to new netlink api Transforms netlink code to dump link tables to use the new netlink api. Makes rtnl_getlink() available regardless of the availability of the wireless extensions. Adding copy_rtnl_link_stats() avoids the structural dependency of struct rtnl_link_stats on struct net_device_stats and thus avoids troubles later on. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:49 -07:00
Thomas Graf	da5e0494c5	[NET]: Convert link modification to new netlink api Transforms do_setlink() into rtnl_setlink() using the new netlink api. A warning message printed to the console is added in the event that a change request fails while part of the change request has been comitted already. The ioctl() based nature of net devices makes it almost impossible to move on to atomic netlink operations without obsoleting some of the functionality. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:48 -07:00
Thomas Graf	1823730fbc	[IPv4]: Move interface address bits to linux/if_addr.h Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:47 -07:00
Thomas Graf	14c0b97ddf	[NET]: Protocol Independant Policy Routing Rules Framework Derived from net/ipv/fib_rules.c Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:40 -07:00
David S. Miller	70f8e78e15	[RTNETLINK]: Fix IFLA_ADDRESS handling. The ->set_mac_address handlers expect a pointer to a sockaddr which contains the MAC address, whereas IFLA_ADDRESS provides just the MAC address itself. So whip up a sockaddr to wrap around the netlink attribute for the ->set_mac_address call. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-08 16:47:37 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Darrel Goeddel	c7bdb545d2	[NETLINK]: Encapsulate eff_cap usage within security framework. This patch encapsulates the usage of eff_cap (in netlink_skb_params) within the security framework by extending security_netlink_recv to include a required capability parameter and converting all direct usage of eff_caps outside of the lsm modules to use the interface. It also updates the SELinux implementation of the security_netlink_send and security_netlink_recv hooks to take advantage of the sid in the netlink_skb_params struct. This also enables SELinux to perform auditing of netlink capability checks. Please apply, for 2.6.18 if possible. Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:57:55 -07:00
Jean Tourrilhes	711e2c33ac	[PATCH] WE-20 for kernel 2.6.16 This is version 20 of the Wireless Extensions. This is the completion of the RtNetlink work I started early 2004, it enables the full Wireless Extension API over RtNetlink. Few comments on the patch : o totally driver transparent, no change in drivers needed. o iwevent were already RtNetlink based since they were created (around 2.5.7). This adds all the regular SET and GET requests over RtNetlink, using the exact same mechanism and data format as iwevents. o This is a Kconfig option, as currently most people have no need for it. Surprisingly, patch is actually small and well encapsulated. o Tested on SMP, attention as been paid to make it 64 bits clean. o Code do probably too many checks and could be further optimised, but better safe than sorry. o RtNetlink based version of the Wireless Tools available on my web page for people inclined to try out this stuff. I would also like to thank Alexey Kuznetsov for his helpful suggestions to make this patch better. Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2006-03-23 07:12:57 -05:00
Stephen Hemminger	6756ae4b4e	[NET]: Convert RTNL to mutex. This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and gets rid of some of the leftover legacy. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:23:58 -08:00
Stefan Rompf	b00055aacd	[NET] core: add RFC2863 operstate this patch adds a dormant flag to network devices, RFC2863 operstate derived from these flags and possibility for userspace interaction. It allows drivers to signal that a device is unusable for user traffic without disabling queueing (and therefore the possibility for protocol establishment traffic to flow) and a userspace supplicant (WPA, 802.1X) to mark a device unusable without changes to the driver. It is the result of our long discussion. However I must admit that it represents what Jamal and I agreed on with compromises towards Krzysztof, but Thomas and Krzysztof still disagree with some parts. Anyway I think it should be applied. Signed-off-by: Stefan Rompf <stefan@loplof.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 17:09:11 -08:00
Alexey Kuznetsov	28633514af	[NETLINK]: illegal use of pid in rtnetlink When a netlink message is not related to a netlink socket, it is issued by kernel socket with pid 0. Netlink "pid" has nothing to do with current->pid. I called it incorrectly, if it was named "port", the confusion would be avoided. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-09 16:43:41 -08:00
Thomas Graf	9ac4a16983	[RTNETLINK]: Use generic netlink receive queue processor Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 02:26:40 +01:00
Thomas Graf	a8f74b2288	[NETLINK]: Make netlink_callback->done() optional Most netlink families make no use of the done() callback, making it optional gets rid of all unnecessary dummy implementations. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 02:26:40 +01:00
Patrick McHardy	066286071d	[NETLINK]: Add "groups" argument to netlink_kernel_create Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:01:11 -07:00
Patrick McHardy	ac6d439d20	[NETLINK]: Convert netlink users to use group numbers instead of bitmasks Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 16:00:54 -07:00
Harald Welte	4fdb3bb723	[NETLINK]: Add properly module refcounting for kernel netlink sockets. - Remove bogus code for compiling netlink as module - Add module refcounting support for modules implementing a netlink protocol - Add support for autoloading modules that implement a netlink protocol as soon as someone opens a socket for that protocol Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:35:08 -07:00
Patrick McHardy	9ef1d4c7c7	[NETLINK]: Missing initializations in dumped data Mostly missing initialization of padding fields of 1 or 2 bytes length, two instances of uninitialized nlmsgerr->msg of 16 bytes length. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:55:30 -07:00
Patrick McHardy	b3563c4fbf	[NETLINK]: Clear padding in netlink messages Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:54:43 -07:00
Jamal Hadi Salim	9ed19f339e	[NETLINK]: Set correct pid for ioctl originating netlink events This patch ensures that netlink events created as a result of programns using ioctls (such as ifconfig, route etc) contains the correct PID of those events. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:55:51 -07:00
Jamal Hadi Salim	b6544c0b4c	[NETLINK]: Correctly set NLM_F_MULTI without checking the pid This patch rectifies some rtnetlink message builders that derive the flags from the pid. It is now explicit like the other cases which get it right. Also fixes half a dozen dumpers which did not set NLM_F_MULTI at all. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:54:12 -07:00
Thomas Graf	c7fb64db00	[NETLINK]: Neighbour table configuration and statistics via rtnetlink To retrieve the neighbour tables send RTM_GETNEIGHTBL with the NLM_F_DUMP flag set. Every neighbour table configuration is spread over multiple messages to avoid running into message size limits on systems with many interfaces. The first message in the sequence transports all not device specific data such as statistics, configuration, and the default parameter set. This message is followed by 0..n messages carrying device specific parameter sets. Although the ordering should be sufficient, NDTA_NAME can be used to identify sequences. The initial message can be identified by checking for NDTA_CONFIG. The device specific messages do not contain this TLV but have NDTPA_IFINDEX set to the corresponding interface index. To change neighbour table attributes, send RTM_SETNEIGHTBL with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3], NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked otherwise. Device specific parameter sets can be changed by setting NDTPA_IFINDEX to the interface index of the corresponding device. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-18 22:50:55 -07:00
David S. Miller	0f4821e7b9	[XFRM/RTNETLINK]: Decrement qlen properly in {xfrm_,rt}netlink_rcv(). If we free up a partially processed packet because it's skb->len dropped to zero, we need to decrement qlen because we are dropping out of the top-level loop so it will do the decrement for us. Spotted by Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-03 16:15:59 -07:00
David S. Miller	09e1430598	[NETLINK]: Fix infinite loops in synchronous netlink changes. The qlen should continue to decrement, even if we pop partially processed SKBs back onto the receive queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-03 15:30:05 -07:00
Herbert Xu	2a0a6ebee1	[NETLINK]: Synchronous message processing. Let's recap the problem. The current asynchronous netlink kernel message processing is vulnerable to these attacks: 1) Hit and run: Attacker sends one or more messages and then exits before they're processed. This may confuse/disable the next netlink user that gets the netlink address of the attacker since it may receive the responses to the attacker's messages. Proposed solutions: a) Synchronous processing. b) Stream mode socket. c) Restrict/prohibit binding. 2) Starvation: Because various netlink rcv functions were written to not return until all messages have been processed on a socket, it is possible for these functions to execute for an arbitrarily long period of time. If this is successfully exploited it could also be used to hold rtnl forever. Proposed solutions: a) Synchronous processing. b) Stream mode socket. Firstly let's cross off solution c). It only solves the first problem and it has user-visible impacts. In particular, it'll break user space applications that expect to bind or communicate with specific netlink addresses (pid's). So we're left with a choice of synchronous processing versus SOCK_STREAM for netlink. For the moment I'm sticking with the synchronous approach as suggested by Alexey since it's simpler and I'd rather spend my time working on other things. However, it does have a number of deficiencies compared to the stream mode solution: 1) User-space to user-space netlink communication is still vulnerable. 2) Inefficient use of resources. This is especially true for rtnetlink since the lock is shared with other users such as networking drivers. The latter could hold the rtnl while communicating with hardware which causes the rtnetlink user to wait when it could be doing other things. 3) It is still possible to DoS all netlink users by flooding the kernel netlink receive queue. The attacker simply fills the receive socket with a single netlink message that fills up the entire queue. The attacker then continues to call sendmsg with the same message in a loop. Point 3) can be countered by retransmissions in user-space code, however it is pretty messy. In light of these problems (in particular, point 3), we should implement stream mode netlink at some point. In the mean time, here is a patch that implements synchronous processing. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-03 14:55:09 -07:00
Thomas Graf	db46edc6d3	[RTNETLINK] Cleanup rtnetlink_link tables Converts remaining rtnetlink_link tables to use c99 designated initializers to make greping a little bit easier. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-03 14:29:39 -07:00
Thomas Graf	f90a0a74b8	[RTNETLINK] Fix & cleanup rtm_min/rtm_max Converts rtm_min and rtm_max arrays to use c99 designated initializers for easier insertion of new message families. RTM_GETMULTICAST and RTM_GETANYCAST did not have the minimal message size specified which means that the netlink message was parsed for routing attributes starting from the header. Adds the proper minimal message sizes for these messages (netlink header + common rtnetlink header) to fix this issue. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-05-03 14:29:00 -07:00
Al Viro	b453257f05	[PATCH] kill gratitious includes of major.h under net/* A lot of places in there are including major.h for no reason whatsoever. Removed. And yes, it still builds. The history of that stuff is often amusing. E.g. for net/core/sock.c the story looks so, as far as I've been able to reconstruct it: we used to need major.h in net/socket.c circa 1.1.early. In 1.1.13 that need had disappeared, along with register_chrdev(SOCKET_MAJOR, "socket", &net_fops) in sock_init(). Include had not. When 1.2 -> 1.3 reorg of net/* had moved a lot of stuff from net/socket.c to net/core/sock.c, this crap had followed... Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-04-25 18:32:13 -07:00
David S. Miller	98f245e797	[RTNETLINK]: Add comma to final entry in link_rtnetlink_table Noticed by Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-04-19 22:37:04 -07:00
Thomas Graf	240eed95eb	[RTNETLINK]: Protocol family wildcard dumping for routing rules Be kind to userspace and don't force them to hardcode protocol families just to have it changed again once we support routing rules for more than one protocol family. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-04-19 22:35:07 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

... 2 3 4 5 6

282 Commits