linux

Author	SHA1	Message	Date
Wu Jiajun-B06378	cd754a5745	gianfar: add GRO support Replace netif_receive_skb with napi_gro_receive. Signed-off-by: Jiajun Wu <b06378@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 16:43:34 -04:00
Eric Dumazet	c06fff6e17	af_packet: packet_getsockopt() cleanup Factorize code, since most fetched values are int type. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 16:36:42 -04:00
Neal Cardwell	900f65d361	tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock() This commit moves the (substantial) common code shared between tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family independent function, tcp_init_sock(). Centralizing this functionality should help avoid drift issues, e.g. where the IPv4 side is updated without a corresponding update to IPv6. There was already some drift: IPv4 initialized snd_cwnd to TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to 2 (in this case it should not matter, since snd_cwnd is also initialized in tcp_init_metrics(), but the general risks and maintenance overhead remain). When diffing the old and new code, note that new tcp_init_sock() function uses the order of steps from the tcp_v4_init_sock() implementation (the order is slightly different in tcp_v6_init_sock()). Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 16:36:42 -04:00
Eric Dumazet	e66e9a3147	net: allow better page reuse in splice(sock -> pipe) splice() from socket to pipe needs linear_to_page() helper to transfert skb header to part of page. We can reset the offset in the current sk->sk_sndmsg_page if we are the last user of the page. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 16:36:42 -04:00
Jiri Pirko	acd6996234	team: add per-port option for enabling/disabling ports Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 16:26:33 -04:00
Jiri Pirko	19a0b58e50	team: allow to enable/disable ports This patch changes content of hashlist (used to get port struct by computed index (0...en_port_count-1)). Now the hash list contains only enabled ports so userspace will be able to say what ports can be used for tx/rx. This becomes handy when userspace will need to disable ports which does not belong to active aggregator. By default, newly added port is enabled. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 16:26:33 -04:00
Jiri Pirko	4c78bb845b	team: lb: let userspace care about port macs Better to leave this for userspace Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 16:26:33 -04:00
Eric Dumazet	a74e910618	net: change big iov allocations iov of more than 8 entries are allocated in sendmsg()/recvmsg() through sock_kmalloc() As these allocations are temporary only and small enough, it makes sense to use plain kmalloc() and avoid sk_omem_alloc atomic overhead. Slightly changed fast path to be even faster. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mike Waychison <mikew@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 16:24:20 -04:00
Pavel Emelyanov	b139ba4e90	tcp: Repair connection-time negotiated parameters There are options, which are set up on a socket while performing TCP handshake. Need to resurrect them on a socket while repairing. A new sockoption accepts a buffer and parses it. The buffer should be CODE:VALUE sequence of bytes, where CODE is standard option code and VALUE is the respective value. Only 4 options should be handled on repaired socket. To read 3 out of 4 of these options the TCP_INFO sockoption can be used. An ability to get the last one (the mss_clamp) was added by the previous patch. Now the restore. Three of these options -- timestamp_ok, mss_clamp and snd_wscale -- are just restored on a coket. The sack_ok flags has 2 issues. First, whether or not to do sacks at all. This flag is just read and set back. No other sack info is saved or restored, since according to the standart and the code dropping all sack-ed segments is OK, the sender will resubmit them again, so after the repair we will probably experience a pause in connection. Next, the fack bit. It's just set back on a socket if the respective sysctl is set. No collected stats about packets flow is preserved. As far as I see (plz, correct me if I'm wrong) the fack-based congestion algorithm survives dropping all of the stats and repairs itself eventually, probably losing the performance for that period. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:52:25 -04:00
Pavel Emelyanov	5e6a3ce657	tcp: Report mss_clamp with TCP_MAXSEG option in repair mode The mss_clamp is the only connection-time negotiated option which cannot be obtained from the user space. Make the TCP_MAXSEG sockopt report one in the repair mode. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:52:25 -04:00
Pavel Emelyanov	c0e88ff0f2	tcp: Repair socket queues Reading queues under repair mode is done with recvmsg call. The queue-under-repair set by TCP_REPAIR_QUEUE option is used to determine which queue should be read. Thus both send and receive queue can be read with this. Caller must pass the MSG_PEEK flag. Writing to queues is done with sendmsg call and yet again -- the repair-queue option can be used to push data into the receive queue. When putting an skb into receive queue a zero tcp header is appented to its head to address the tcp_hdr(skb)->syn and the ->fin checks by the (after repair) tcp_recvmsg. These flags flags are both set to zero and that's why. The fin cannot be met in the queue while reading the source socket, since the repair only works for closed/established sockets and queueing fin packet always changes its state. The syn in the queue denotes that the respective skb's seq is "off-by-one" as compared to the actual payload lenght. Thus, at the rcv queue refill we can just drop this flag and set the skb's sequences to precice values. When the repair mode is turned off, the write queue seqs are updated so that the whole queue is considered to be 'already sent, waiting for ACKs' (write_seq = snd_nxt <= snd_una). From the protocol POV the send queue looks like it was sent, but the data between the write_seq and snd_nxt is lost in the network. This helps to avoid another sockoption for setting the snd_nxt sequence. Leaving the whole queue in a 'not yet sent' state (as it will be after sendmsg-s) will not allow to receive any acks from the peer since the ack_seq will be after the snd_nxt. Thus even the ack for the window probe will be dropped and the connection will be 'locked' with the zero peer window. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:52:25 -04:00
Pavel Emelyanov	ee9952831c	tcp: Initial repair mode This includes (according the the previous description): * TCP_REPAIR sockoption This one just puts the socket in/out of the repair mode. Allowed for CAP_NET_ADMIN and for closed/establised sockets only. When repair mode is turned off and the socket happens to be in the established state the window probe is sent to the peer to 'unlock' the connection. * TCP_REPAIR_QUEUE sockoption This one sets the queue which we're about to repair. The 'no-queue' is set by default. * TCP_QUEUE_SEQ socoption Sets the write_seq/rcv_nxt of a selected repaired queue. Allowed for TCP_CLOSE-d sockets only. When the socket changes its state the other seq-s are changed by the kernel according to the protocol rules (most of the existing code is actually reused). * Ability to forcibly bind a socket to a port The sk->sk_reuse is set to SK_FORCE_REUSE. * Immediate connect modification The connect syscall initializes the connection, then directly jumps to the code which finalizes it. * Silent close modification The close just aborts the connection (similar to SO_LINGER with 0 time) but without sending any FIN/RST-s to peer. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:52:25 -04:00
Pavel Emelyanov	370816aef0	tcp: Move code around This is just the preparation patch, which makes the needed for TCP repair code ready for use. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:52:25 -04:00
Pavel Emelyanov	4a17fd5229	sock: Introduce named constants for sk_reuse Name them in a "backward compatible" manner, i.e. reuse or not are still 1 and 0 respectively. The reuse value of 2 means that the socket with it will forcibly reuse everyone else's port. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:52:25 -04:00
Arnd Bergmann	59c55bdde8	drivers/net: decouple ISA and ISA_DMA_API The two options are separate, and some platforms (e.g. arm pxa) have ISA slots but no ISA dma controller, so they cannot build drivers using the DMA API functions. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:28:48 -04:00
Arnd Bergmann	3a22d5d5eb	sungem: use mdelay instead of udelay where necessary Some architectures like ARM cannot handle large numbers as arguments to udelay, so the drivers should use mdelay when delaying for multiple miliseconds. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:28:48 -04:00
Arnd Bergmann	32fd32a59c	donauboe: replace excessive udelay with msleep No driver should spin the CPU for 10ms, so better use an msleep, which is allowed in the ->suspend function. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:28:47 -04:00
Arnd Bergmann	31f31204df	8390: select CRC32 support The ax88796 driver uses the CRC32 functions, so make sure that they are actually enabled. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:28:47 -04:00
Arnd Bergmann	ee29e6134b	drivers/net: iwmc3200 depends on EXPERIMENTAL The iwmc3200 driver selects other code in Kconfig that depends on EXPERIMENTAL. Kconfig warns about this when CONFIG_EXPERIMENTAL is not already set, so logically, these options should also be marked experimental or promoted to stable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:28:47 -04:00
Arnd Bergmann	6e4a76291e	caif: include linux/io.h The caif_shmcore requires io.h in order to use ioremap, so include that explicitly to compile in all configurations. Also add a note about the use of ioremap(), which is not a proper way to map a DMA buffer into kernel space. It's not completely clear what the intention is for using ioremap, but it is clear that the result of ioremap must not simply be accessed using kernel pointers but should use readl/writel or memcopy_{to,from}io. Assigning the result of ioremap to a regular pointer that can also be set to something else is not ok. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:28:47 -04:00
Arnd Bergmann	65f6092517	drivers/net: add missing __devexit_p() annotations Drivers that refer to a __devexit function in an operations structure need to annotate that pointer with __devexit_p so replace it with a NULL pointer when the section gets discarded. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:28:47 -04:00
Arnd Bergmann	32a6d90bb3	davinci_cpdma: export symbols used by other drivers The davinci_emac driver can be a module, so the symbols it needs from the cpdma driver must be exported. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:28:47 -04:00
Richard Cochran	8e7073a388	pch_gbe: remove suspicious comment The time stamping code in this driver appears to have been copied from the ixp4xx_eth.c driver, including this timing comment. I had actually measured the time stamp delay on an IXP425, but I really doubt that this value also applies here. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:27:45 -04:00
Richard Cochran	32127a0a0a	pch_gbe: run the ptp bpf just once per packet This patch fixes code which needlessly ran the BPF twice per packet. Instead, we just run the classifier once and test whether the packet is any kind of PTP event message. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:27:45 -04:00
Takahiro Shimizu	358dfb6d77	pch_gbe: correct receive time stamp filtering This patch fixes the driver so that multicast PTP event messages can be recognized by the hardware time stamping unit. The station address register must be set according to the desired transport type. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:27:45 -04:00
Takahiro Shimizu	a6891ac70c	pch_gbe: do not set the channel control register We will let the pch_gbe code do that according to the receive time stamp filter. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:27:45 -04:00
Takahiro Shimizu	93c8acb599	pch_gbe: improve coding style This patch clears up a few coding style issues: - Makes two function definitions a bit nicer looking. - Remove unneeded parentheses. - Simplify macros for register bits. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:27:45 -04:00
Takahiro Shimizu	17cdedf3b3	pch_gbe: export a method to set the receive match address The code in phc_gbe_main will need to call this method in order to set the station address register according to the receive time stamping filter. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:27:45 -04:00
Takahiro Shimizu	eefc48b078	pch_gbe: reprogram multicast address register on reset The reset logic after a Rx FIFO overrun will clear the programmed multicast addresses. This patch fixes the issue by reprogramming the registers after the reset. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:27:45 -04:00
Takahiro Shimizu	5481c8cd83	pch_gbe: simplify transmit time stamping flag test This patch makes logic surrounding the test of the transmit time stamping flag more readable. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:27:45 -04:00
Takahiro Shimizu	d50566c727	pch_gbe: scale time stamps to nanoseconds This patch fixes the helper functions that give the transmit and receive time stamps to return nanoseconds, instead of arbitrary clock ticks. [ RC - Rebased Takahiro's changes and wrote a commit message explaining the changes. ] Signed-off-by: Takahiro Shimizu <tshimizu818@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:27:44 -04:00
Eric W. Biederman	5f568e5afe	net: Remove register_net_sysctl_table All of the users have been converted to use registera_net_sysctl so we no longer need register_net_sysctl. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:30 -04:00
Eric W. Biederman	a5347fe36b	net: Delete all remaining instances of ctl_path We don't use struct ctl_path anymore so delete the exported constants. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:30 -04:00
Eric W. Biederman	ec8f23ce0f	net: Convert all sysctl registrations to register_net_sysctl This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:30 -04:00
Eric W. Biederman	f99e8f715a	net: Convert nf_conntrack_proto to use register_net_sysctl There isn't much advantage here except that strings paths are a bit easier to read, and converting everything to them allows me to kill off ctl_path. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:30 -04:00
Eric W. Biederman	8607ddb867	net ipv4: Convert devinet to use register_net_sysctl Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:30 -04:00
Eric W. Biederman	6105e29320	net ipv6: Convert addrconf to use register_net_sysctl Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:29 -04:00
Eric W. Biederman	9bdcc88fa0	net decnet: Convert to use register_net_sysctl Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:29 -04:00
Eric W. Biederman	8f40a1f982	net neighbour: Convert to use register_net_sysctl Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:29 -04:00
Eric W. Biederman	6dceb03687	net ipv6: Don't use sysctl tables with .child entries. The sysctl core no longer natively understands sysctl tables with .child entries. Split the ipv6_table to remove the .child entries. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:29 -04:00
Eric W. Biederman	64fb301040	net llc: Don't use sysctl tables with .child entries. The sysctl core no longer natively understands sysctl tables with .child entries. Kill the intermediate tables and use register_net_sysctl directly to remove the need for compatibility code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:29 -04:00
Eric W. Biederman	0ca7a4c87d	net ax25: Simplify and cleanup the ax25 sysctl handling. Don't register/unregister every ax25 table in a batch. Instead register and unregister per device ax25 sysctls as ax25 devices come and go. This moves ax25 to be a completely modern sysctl user. Registering the sysctls in just the initial network namespace, removing the use of .child entries that are no longer natively supported by the sysctl core and taking advantage of the fact that there are no longer any ordering constraints between registering and unregistering different sysctl tables. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:22:28 -04:00
Eric W. Biederman	4e5ca78541	net ipv4: Remove the unneeded registration of an empty net/ipv4/neigh sysctl no longer requires explicit creation of directories. The neigh directory is always populated with at least a default entry so this won't cause any user visible changes. Delete the ipv4_path and the ipv4_skeleton these are no longer needed. Directly register the ipv4_route_table. And since I am an idiot remove the header definitions that I should have removed in the previous patch. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:21:18 -04:00
Eric W. Biederman	a5287acc6c	net ipv6: Remove unneded registration of an empty net/ipv6/neigh sysctl no longer requires explicit creation of directories. The neigh directory is always populated with at least a default entry so this should cause no user visible changes. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:21:18 -04:00
Eric W. Biederman	45bad91498	net core: Remove unneded creation of an empty net/core sysctl directory On the next line we register the net_core_table in net/core which creates the directory and ensures it exists. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:21:18 -04:00
Eric W. Biederman	5dd3df105b	net: Move all of the network sysctls without a namespace into init_net. This makes it clearer which sysctls are relative to your current network namespace. This makes it a little less error prone by not exposing sysctls for the initial network namespace in other namespaces. This is the same way we handle all of our other network interfaces to userspace and I can't honestly remember why we didn't do this for sysctls right from the start. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:21:17 -04:00
Eric W. Biederman	4344475797	net: Kill register_sysctl_rotable register_sysctl_rotable never caught on as an interesting way to register sysctls. My take on the situation is that what we want are sysctls that we can only see in the initial network namespace. What we have implemented with register_sysctl_rotable are sysctls that we can see in all of the network namespaces and can only change in the initial network namespace. That is a very silly way to go. Just register the network sysctls in the initial network namespace and we don't have any weird special cases to deal with. The sysctls affected are: /proc/sys/net/ipv4/ipfrag_secret_interval /proc/sys/net/ipv4/ipfrag_max_dist /proc/sys/net/ipv6/ip6frag_secret_interval /proc/sys/net/ipv6/mld_max_msf I really don't expect anyone will miss them if they can't read them in a child user namespace. CC: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:21:17 -04:00
Eric W. Biederman	2ca794e5e8	net sysctl: Initialize the network sysctls sooner to avoid problems. If the netfilter code is modified to use register_net_sysctl_table the kernel fails to boot because the per net sysctl infrasturce is not setup soon enough. So to avoid races call net_sysctl_init from sock_init(). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:21:16 -04:00
Eric W. Biederman	bc8a36942a	net sysctl: Register an empty /proc/sys/net Implementation limitations of the sysctl core won't let /proc/sys/net reside in a network namespace. /proc/sys/net at least must be registered as a normal sysctl. So register /proc/sys/net early as an empty directory to guarantee we don't violate this constraint and hit bugs in the sysctl implementation. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:21:16 -04:00
Eric W. Biederman	ab41a2ca50	net: Implement register_net_sysctl. Right now all of the networking sysctl registrations are running in a compatibiity mode. The natvie sysctl registration api takes a cstring for a path and a simple ctl_table. Implement register_net_sysctl so that we can register network sysctls without needing to use compatiblity code in the sysctl core. Switching from a ctl_path to a cstring results in less boiler plate and denser code that is a little easier to read. I would simply have changed the arguments to register_net_sysctl_table instead of keeping two functions in parallel but gcc will allow a ctl_path pointer to be passed to a char * pointer with only issuing a warning resulting in completely incorrect code can be built. Since I have to change the function name I am taking advantage of the situation to let both register_net_sysctl and register_net_sysctl_table live for a short time in parallel which makes clean conversion patches a bit easier to read and write. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-20 21:21:15 -04:00

1 2 3 4 5 ...

299869 Commits