linux

Author	SHA1	Message	Date
Eric Dumazet	9fdc6bef5f	tuntap: dont use a private kmem_cache Commit `96442e4242` (tuntap: choose the txq based on rxq) added a per tun_struct kmem_cache. As soon as several tun_struct are used, we get an error because two caches cannot have same name. Use the default kmalloc()/kfree_rcu(), as it reduce code size and doesn't have performance impact here. Reported-by: Paul Moore <pmoore@redhat.com> Tested-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-21 13:14:01 -08:00
Jason Wang	d32649d171	tuntap: fix sparse warning Make tun_enable_queue() static to fix the sparse warning: drivers/net/tun.c:399:19: sparse: symbol 'tun_enable_queue' was not declared. Should it be static? Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-17 20:49:06 -08:00
Eric Dumazet	76fe45812a	tuntap: reset network header before calling skb_get_rxhash() Commit `499744209b` (tuntap: dont use skb after netif_rx_ni(skb)) introduced another bug. skb_get_rxhash() needs to access the network header, and it was set for us in netif_rx_ni(). We need to reset network header or else skb_flow_dissect() behavior is out of control. Reported-and-tested-by: Kirill A. Shutemov <kirill@shutemov.name> Tested-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-17 12:32:44 -08:00
Jason Wang	4008e97f86	tuntap: fix ambigious multiqueue API The current multiqueue API is ambigious which may confuse both user and LSM to do things correctly: - Both TUNSETIFF and TUNSETQUEUE could be used to create the queues of a tuntap device. - TUNSETQUEUE were used to disable and enable a specific queue of the device. But since the state of tuntap were completely removed from the queue, it could be used to attach to another device (there's no such kind of requirement currently, and it needs new kind of LSM policy. - TUNSETQUEUE could be used to attach to a persistent device without any queues. This kind of attching bypass the necessary checking during TUNSETIFF and may lead unexpected result. So this patch tries to make a cleaner and simpler API by: - Only allow TUNSETIFF to create queues. - TUNSETQUEUE could be only used to disable and enabled the queues of a device, and the state of the tuntap device were not detachd from the queues when it was disabled, so TUNSETQUEUE could be only used after TUNSETIFF and with the same device. This is done by introducing a list which keeps track of all queues which were disabled. The queue would be moved between this list and tfiles[] array when it was enabled/disabled. A pointer of the tun_struct were also introdued to track the device it belongs to when it was disabled. After the change, the isolation between management and application could be done through: TUNSETIFF were only called by management software and TUNSETQUEUE were only called by application.For LSM/SELinux, the things left is to do proper check during tun_set_queue() if needed. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-14 13:14:06 -05:00
Eric Dumazet	499744209b	tuntap: dont use skb after netif_rx_ni(skb) On Wed, 2012-12-12 at 23:16 -0500, Dave Jones wrote: > Since todays net merge, I see this when I start openvpn.. > > general protection fault: 0000 [#1] PREEMPT SMP > Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs iTCO_wdt iTCO_vendor_support snd_emu10k1 snd_util_mem snd_ac97_codec coretemp ac97_bus microcode snd_hwdep snd_seq pcspkr snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd_rawmidi mfd_core snd_seq_device snd e1000e soundcore emu10k1_gp gameport i82975x_edac edac_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci sata_sil firewire_core crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core floppy > CPU 0 > Pid: 1381, comm: openvpn Not tainted 3.7.0+ #14 /D975XBX > RIP: 0010:[<ffffffff815b54a4>] [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0 > RSP: 0018:ffff88007d0d9c48 EFLAGS: 00010206 > RAX: 000000000000055d RBX: 6b6b6b6b6b6b6b4b RCX: 1471030a0180040a > RDX: 0000000000000005 RSI: 00000000ffffffe0 RDI: ffff8800ba83fa80 > RBP: ffff88007d0d9cb8 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000101 R12: ffff8800ba83fa80 > R13: 0000000000000008 R14: ffff88007d0d9cc8 R15: ffff8800ba83fa80 > FS: 00007f6637104800(0000) GS:ffff8800bf600000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f563f5b01c4 CR3: 000000007d140000 CR4: 00000000000007f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process openvpn (pid: 1381, threadinfo ffff88007d0d8000, task ffff8800a540cd60) > Stack: > ffff8800ba83fa80 0000000000000296 0000000000000000 0000000000000000 > ffff88007d0d9cc8 ffffffff815bcff4 ffff88007d0d9ce8 ffffffff815b1831 > ffff88007d0d9ca8 00000000703f6364 ffff8800ba83fa80 0000000000000000 > Call Trace: > [<ffffffff815bcff4>] ? netif_rx+0x114/0x4c0 > [<ffffffff815b1831>] ? skb_copy_datagram_from_iovec+0x61/0x290 > [<ffffffff815b672a>] __skb_get_rxhash+0x1a/0xd0 > [<ffffffffa03b9538>] tun_get_user+0x418/0x810 [tun] > [<ffffffff8135f468>] ? delay_tsc+0x98/0xf0 > [<ffffffff8109605c>] ? __rcu_read_unlock+0x5c/0xa0 > [<ffffffffa03b9a41>] tun_chr_aio_write+0x81/0xb0 [tun] > [<ffffffff81145011>] ? __buffer_unlock_commit+0x41/0x50 > [<ffffffff811db917>] do_sync_write+0xa7/0xe0 > [<ffffffff811dc01f>] vfs_write+0xaf/0x190 > [<ffffffff811dc375>] sys_write+0x55/0xa0 > [<ffffffff81705540>] tracesys+0xdd/0xe2 > Code: 41 8b 44 24 68 41 2b 44 24 6c 01 de 29 f0 83 f8 03 0f 8e a0 00 00 00 48 63 de 49 03 9c 24 e0 00 00 00 48 85 db 0f 84 72 fe ff ff <8b> 03 41 89 46 08 b8 01 00 00 00 e9 43 fd ff ff 0f 1f 40 00 48 > RIP [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0 > RSP <ffff88007d0d9c48> > ---[ end trace 6d42c834c72c002e ]--- > > > Faulting instruction is > > 0: 8b 03 mov (%rbx),%eax > > rbx is slab poison (-20) so this looks like a use-after-free here... > > flow->ports = *ports; > 314: 8b 03 mov (%rbx),%eax > 316: 41 89 46 08 mov %eax,0x8(%r14) > > in the inlined skb_header_pointer in skb_flow_dissect > > Dave > commit `96442e4242` (tuntap: choose the txq based on rxq) added a use after free. Cache rxhash in a temp variable before calling netif_rx_ni() Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-13 12:58:11 -05:00
stephen hemminger	a676847b39	tun: allow setting ethernet addresss while running This is a pure software device, and ok with live address change. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-11 12:49:53 -05:00
Paul Moore	b3943aef7e	tun: correctly report an error in tun_flow_init() On error, the error code from tun_flow_init() is lost inside tun_set_iff(), this patch fixes this by assigning the tun_flow_init() error code to the "err" variable which is returned by the tun_flow_init() function on error. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-07 13:20:46 -05:00
Michael S. Tsirkin	5d09710925	tun: only queue packets on device Historically tun supported two modes of operation: - in default mode, a small number of packets would get queued at the device, the rest would be queued in qdisc - in one queue mode, all packets would get queued at the device This might have made sense up to a point where we made the queue depth for both modes the same and set it to a huge value (500) so unless the consumer is stuck the chance of losing packets is small. Thus in practice both modes behave the same, but the default mode has some problems: - if packets are never consumed, fragments are never orphaned which cases a DOS for sender using zero copy transmit - overrun errors are hard to diagnose: fifo error is incremented only once so you can not distinguish between userspace that is stuck and a transient failure, tcpdump on the device does not show any traffic Userspace solves this simply by enabling IFF_ONE_QUEUE but there seems to be little point in not doing the right thing for everyone, by default. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-03 15:07:36 -05:00
Jason Wang	eb0fb363f9	tuntap: attach queue 0 before registering netdevice We attach queue 0 after registering netdevice currently. This leads to call netif_set_real_num_{tx\|rx}_queues() after registering the netdevice. Since we allow tun/tap has a maximum of 1024 queues, this may lead a huge number of uevents to be injected to userspace since we create 2048 kobjects and then remove 2046. Solve this problem by attaching queue 0 and set the real number of queues before registering netdevice. Reported-by: Jiri Slaby <jslaby@suse.cz> Tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-03 13:47:57 -05:00
Rami Rosen	3872baf618	tun: put correct method name in a debug message. This patch puts the correct method name, tun_do_read, in a debug message. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-26 17:22:10 -05:00
Rami Rosen	36fe8c0973	vtun: fix typos. This patch fixes four typos in drivers/net/vtun.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-26 17:22:09 -05:00
Rami Rosen	9ce99cf6dc	tun: change tun_get_iff() prototype. This patch changes tun_get_iff() prototype to return void as it never fails. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-23 14:24:46 -05:00
Eric W. Biederman	c260b7722f	net: Allow userns root to control tun and tap devices Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) calls to ns_capable(net->user_ns,CAP_NET_ADMIN) calls. Allow setting of the tun iff flags. Allow creating of tun devices. Allow adding a new queue to a tun device. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-19 14:15:54 -05:00
Michael S. Tsirkin	149d36f718	tun: report orphan frags errors to zero copy callback When tun transmits a zero copy skb, it orphans the frags which might need to allocate extra memory, in atomic context. If that fails, notify ubufs callback before freeing the skb as a hint that device should disable zerocopy mode. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Jason Wang	96442e4242	tuntap: choose the txq based on rxq This patch implements a simple multiqueue flow steering policy - tx follows rx for tun/tap. The idea is simple, it just choose the txq based on which rxq it comes. The flow were identified through the rxhash of a skb, and the hash to queue mapping were recorded in a hlist with an ageing timer to retire the mapping. The mapping were created when tun receives packet from userspace, and was quired in .ndo_select_queue(). I run co-current TCP_CRR test and didn't see any mapping manipulation helpers in perf top, so the overhead could be negelected. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:09 -04:00
Jason Wang	cde8b15f1a	tuntap: add ioctl to attach or detach a file form tuntap device Sometimes usespace may need to active/deactive a queue, this could be done by detaching and attaching a file from tuntap device. This patch introduces a new ioctls - TUNSETQUEUE which could be used to do this. Flag IFF_ATTACH_QUEUE were introduced to do attaching while IFF_DETACH_QUEUE were introduced to do the detaching. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:08 -04:00
Jason Wang	c8d68e6be1	tuntap: multiqueue support This patch converts tun/tap to a multiqueue devices and expose the multiqueue queues as multiple file descriptors to userspace. Internally, each tun_file were abstracted as a queue, and an array of pointers to tun_file structurs were stored in tun_structure device, so multiple tun_files were allowed to be attached to the device as multiple queues. When choosing txq, we first try to identify a flow through its rxhash, if it does not have such one, we could try recorded rxq and then use them to choose the transmit queue. This policy may be changed in the future. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:08 -04:00
Jason Wang	6e914fc707	tuntap: RCUify dereferencing between tun_struct and tun_file RCU were introduced in this patch to synchronize the dereferences between tun_struct and tun_file. All tun_{get\|put} were replaced with RCU, the dereference from one to other must be done under rtnl lock or rcu read critical region. This is needed for the following patches since the one of the goal of multiqueue tuntap is to allow adding or removing queues during workload. Without RCU, control path would hold tx locks when adding or removing queues (which may cause sme delay) and it's hard to change the number of queues without stopping the net device. With the help of rcu, there's also no need for tun_file hold an refcnt to tun_struct. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:07 -04:00
Jason Wang	54f968d6ef	tuntap: move socket to tun_file Current tuntap makes use of the socket receive queue as its tx queue. To implement multiple tx queues for tuntap and enable the ability of adding and removing queues during workload, the first step is to move the socket related structures to tun_file. Then we could let multiple fds/sockets to be attached to the tuntap. This patch removes tun_sock and moves socket related structures from tun_sock or tun_struct to tun_file. Two exceptions are tap_filter and sock_fprog, they are still kept in tun_structure since they are used to filter packets for the net device instead of per transmit queue (at least I see no requirements for them). After those changes, socket were created and destroyed during file open and close (instead of device creation and destroy), the socket structures could be dereferenced from tun_file instead of the file of tun_struct structure itself. For persisent device, since we purge during datching and wouldn't queue any packets when no interface were attached, there's no behaviod changes before and after this patch, so the changes were transparent to the userspace. To keep the attributes such as sndbuf, socket filter and vnet header, those would be re-initialize after a new interface were attached to an persist device. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:07 -04:00
Jason Wang	1e5883382c	tuntap: log the unsigned informaiton with %u Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-01 11:14:06 -04:00
Daniel Wagner	6a328d8c6f	cgroup: net_cls: Rework update socket logic The cgroup logic part of net_cls is very similar as the one in net_prio. Let's stream line the net_cls logic with the net_prio one. The net_prio update logic was changed by following commit (note there were some changes necessary later on) commit `406a3c638c` Author: John Fastabend <john.r.fastabend@intel.com> Date: Fri Jul 20 10:39:25 2012 +0000 net: netprio_cgroup: rework update socket logic Instead of updating the sk_cgrp_prioidx struct field on every send this only updates the field when a task is moved via cgroup infrastructure. This allows sockets that may be used by a kernel worker thread to be managed. For example in the iscsi case today a user can put iscsid in a netprio cgroup and control traffic will be sent with the correct sk_cgrp_prioidx value set but as soon as data is sent the kernel worker thread isssues a send and sk_cgrp_prioidx is updated with the kernel worker threads value which is the default case. It seems more correct to only update the field when the user explicitly sets it via control group infrastructure. This allows the users to manage sockets that may be used with other threads. Since classid is now updated when the task is moved between the cgroups, we don't have to call sock_update_classid() from various places to ensure we always using the latest classid value. [v2: Use iterate_fd() instead of open coding] Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Li Zefan <lizefan@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Joe Perches <joe@perches.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: <netdev@vger.kernel.org> Cc: <cgroups@vger.kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-26 03:40:51 -04:00
Daniel Wagner	fd9a08a7b8	cgroup: net_cls: Pass in task to sock_update_classid() sock_update_classid() assumes that the update operation always are applied on the current task. sock_update_classid() needs to know on which tasks to work on in order to be able to migrate task between cgroups using the struct cgroup_subsys attach() callback. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Joe Perches <joe@perches.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: <netdev@vger.kernel.org> Cc: <cgroups@vger.kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-26 03:40:50 -04:00
Linus Torvalds	437589a74b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...	2012-10-02 11:11:09 -07:00
Daniel Wagner	f341980771	cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h The only user of sock_update_classid() is net/socket.c which happens to include cls_cgroup.h directly. tj: Fix build breakage due to missing cls_cgroup.h inclusion in drivers/net/tun.c reported in linux-next by Stephen. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org	2012-09-14 09:55:57 -07:00
Eric W. Biederman	0625c883bc	userns: Convert tun/tap to use kuid and kgid where appropriate Cc: Maxim Krasnyansky <maxk@qualcomm.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:55:31 -07:00
Stanislav Kinsbursky	66d1b9263a	tun: don't zeroize sock->file on detach This is a fix for bug, introduced in 3.4 kernel by commit `1ab5ecb90c` ("tun: don't hold network namespace by tun sockets"), which, among other things, replaced simple sock_put() by sk_release_kernel(). Below is sequence, which leads to oops for non-persistent devices: tun_chr_close() tun_detach() <== tun->socket.file = NULL tun_free_netdev() sk_release_sock() sock_release(sock->file == NULL) iput(SOCK_INODE(sock)) <== dereference on NULL pointer This patch just removes zeroing of socket's file from __tun_detach(). sock_release() will do this. Cc: stable@vger.kernel.org Reported-by: Ruan Zhijie <ruanzhijie@hotmail.com> Tested-by: Ruan Zhijie <ruanzhijie@hotmail.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 16:16:14 -07:00
David S. Miller	8bbb181308	tun: Fix formatting. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-30 14:52:48 -07:00
Mathias Krause	a117dacde0	net/tun: fix ioctl() based info leaks The tun module leaks up to 36 bytes of memory by not fully initializing a structure located on the stack that gets copied to user memory by the TUNGETIFF and SIOCGIFHWADDR ioctl()s. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-29 23:18:31 -07:00
Michael S. Tsirkin	0690899b4d	tun: experimental zero copy tx support Let vhost-net utilize zero copy tx when used with tun. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Michael S. Tsirkin	868eefeb17	tun: orphan frags on xmit tun xmit is actually receive of the internal tun socket. Orphan the frags same as we do for normal rx path. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-22 12:39:33 -07:00
Mikulas Patocka	b09e786bd1	tun: fix a crash bug and a memory leak This patch fixes a crash tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel -> sock_release -> iput(SOCK_INODE(sock)) introduced by commit `1ab5ecb90c` The problem is that this socket is embedded in struct tun_struct, it has no inode, iput is called on invalid inode, which modifies invalid memory and optionally causes a crash. sock_release also decrements sockets_in_use, this causes a bug that "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when creating and closing tun devices. This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs sock_release to not free the inode and not decrement sockets_in_use, fixing both memory corruption and sockets_in_use underflow. It should be backported to 3.3 an 3.4 stabke. Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-20 11:21:06 -07:00
Joe Perches	344dc8ede1	drivers/net: Use eth_random_addr Convert the existing uses of random_ether_addr to the new eth_random_addr. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-16 22:38:28 -07:00
Joe Perches	2e42e4747e	drivers/net: Convert compare_ether_addr to ether_addr_equal Use the new bool function ether_addr_equal to add some clarity and reduce the likelihood for misuse of compare_ether_addr for sorting. Done via cocci script: $ cat compare_ether_addr.cocci @@ expression a,b; @@ - !compare_ether_addr(a, b) + ether_addr_equal(a, b) @@ expression a,b; @@ - compare_ether_addr(a, b) + !ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) == 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) != 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) == 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) != 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !!ether_addr_equal(a, b) + ether_addr_equal(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:33:01 -04:00
David Howells	9ffc93f203	Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\sinclude\s<asm/system[.]h>.\n!!' `grep -Irl '^#\sinclude\s<asm/system[.]h>' ` Signed-off-by: David Howells <dhowells@redhat.com>	2012-03-28 18:30:03 +01:00
David S. Miller	4da0bd7365	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-03-18 23:29:41 -04:00
Stanislav Kinsbursky	1ab5ecb90c	tun: don't hold network namespace by tun sockets v3: added previously removed sock_put() to the tun_release() callback, because sk_release_kernel() doesn't drop the socket reference. v2: sk_release_kernel() used for socket release. Dummy tun_release() is required for sk_release_kernel() ---> sock_release() ---> sock->ops->release() call. TUN was designed to destroy it's socket on network namesapce shutdown. But this will never happen for persistent device, because it's socket holds network namespace. This patch removes of holding network namespace by TUN socket and replaces it by creating socket in init_net and then changing it's net it to desired one. On shutdown socket is moved back to init_net prior to final put. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-12 17:14:00 -07:00
Danny Kukawka	f2cedb63df	net: replace random_ether_addr() with eth_hw_addr_random() Replace usage of random_ether_addr() with eth_hw_addr_random() to set addr_assign_type correctly to NET_ADDR_RANDOM. Change the trivial cases. v2: adapt to renamed eth_hw_addr_random() Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-02-15 15:34:16 -05:00
Rick Jones	84b4050111	Sweep away N/A fw_version dustbunnies from the .get_drvinfo routine of a number of drivers Per discussion with Ben Hutchings and David Miller, go through and remove assignments of "N/A" to fw_version in various drivers' .get_drvinfo routines. While there clean-up some use of bare constants and such. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-22 16:43:32 -05:00
Michał Mirosław	c8f44affb7	net: introduce and use netdev_features_t for device features sets v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:43:10 -05:00
Rick Jones	33a5ba144e	net: sweep-up some straglers in strlcpy conversion of .get_drvinfo routines Convert some remaining straglers' .get_drvinfo routines to use strlcpy rather than strcpy/strncpy. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:38:55 -05:00
Jiri Pirko	afc4b13df1	net: remove use of ndo_set_multicast_list in drivers replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:22:03 -07:00
Neil Horman	550fd08c2c	net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared After the last patch, We are left in a state in which only drivers calling ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real hardware call ether_setup for their net_devices and don't hold any state in their skbs. There are a handful of drivers that violate this assumption of course, and need to be fixed up. This patch identifies those drivers, and marks them as not being able to support the safe transmission of skbs by clearning the IFF_TX_SKB_SHARING flag in priv_flags Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Karsten Keil <isdn@linux-pingi.de> CC: "David S. Miller" <davem@davemloft.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Patrick McHardy <kaber@trash.net> CC: Krzysztof Halasa <khc@pm.waw.pl> CC: "John W. Linville" <linville@tuxdriver.com> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Marcel Holtmann <marcel@holtmann.org> CC: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-27 22:39:30 -07:00
David S. Miller	9f6ec8d697	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c	2011-06-20 22:29:08 -07:00
Neil Horman	bebd097a0a	tun: teach the tun/tap driver to support netpoll Commit `8d8fc29d02` changed the behavior of slave devices in regards to netpoll. Specifically it created a mutually exclusive relationship between being a slave and a netpoll-capable device. This creates problems for KVM because guests relied on needing netconsole active on a slave device to a bridge. Ideally libvirtd could just attach netconsole to the bridge device instead, but thats currently infeasible, because while the bridge device supports netpoll, it requires that all slave interface also support it, but the tun/tap driver currently does not. The most direct solution is to teach tun/tap to support netpoll, which is implemented by the patch below. I've not tested this yet, but its pretty straightforward. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Rik van Riel <riel@redhat.com> CC: Rik van Riel <riel@redhat.com> CC: Maxim Krasnyansky <maxk@qualcomm.com> CC: Cong Wang <amwang@redhat.com> CC: "David S. Miller" <davem@davemloft.net> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Rik van Riel <riel@redhat.com> Reviewed-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>	2011-06-16 23:53:10 -04:00
Jason Wang	10a8d94a95	virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID There's no need for the guest to validate the checksum if it have been validated by host nics. So this patch introduces a new flag - VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum examing in guest. The backend (tap/macvtap) may set this flag when met skbs with CHECKSUM_UNNECESSARY to save cpu utilization. No feature negotiation is needed as old driver just ignore this flag. Iperf shows 12%-30% performance improvement for UDP traffic. For TCP, when gro is on no difference as it produces skb with partial checksum. But when gro is disabled, 20% or even higher improvement could be measured by netperf. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-11 15:57:47 -07:00
Amos Kong	61a5ff15eb	tun: do not put self in waitq if doing a nonblock read Perf shows a relatively high rate (about 8%) race in spin_lock_irqsave() when doing netperf between external host and guest. It's mainly becuase the lock contention between the tun_do_read() and tun_xmit_skb(), so this patch do not put self into waitqueue to reduce this kind of race. After this patch, it drops to 4%. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-09 00:27:10 -07:00
stephen hemminger	6f7c156c08	tun: dont force inline of functions Current standard practice is to not mark most functions as inline and let compiler decide instead. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-09 00:08:39 -07:00
stephen hemminger	a504b86e71	tun: reserves space for network in skb The tun driver allocates skb's to hold data from user and then passes the data into the network stack as received data. Most network devices allocate the receive skb with routines like dev_alloc_skb() that reserves additional space for use by network protocol stack but tun does not. Because of the lack of padding, when the packet is passed through bridge netfilter a new skb has to be allocated. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-09 00:08:38 -07:00
Joe Perches	6403eab143	drivers/net: Remove unnecessary semicolons Semicolons are not necessary after switch/while/for/if braces so remove them. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-05 14:33:40 -07:00
Jiri Pirko	1c5cae815d	net: call dev_alloc_name from register_netdevice Force dev_alloc_name() to be called from register_netdevice() by dev_get_valid_name(). That allows to remove multiple explicit dev_alloc_name() calls. The possibility to call dev_alloc_name in advance remains. This also fixes veth creation regresion caused by `84c49d8c3e` Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-05 10:57:45 -07:00

1 2 3 4

178 Commits