Its easy to eat all kernel memory and trigger NMI watchdog, using an
exploit program that queues unix sockets on top of others.
lkml ref : http://lkml.org/lkml/2010/11/25/8
This mechanism is used in applications, one choice we have is to have a
recursion limit.
Other limits might be needed as well (if we queue other types of files),
since the passfd mechanism is currently limited by socket receive queue
sizes only.
Add a recursion_level to unix socket, allowing up to 4 levels.
Each time we send an unix socket through sendfd mechanism, we copy its
recursion level (plus one) to receiver. This recursion level is cleared
when socket receive queue is emptied.
Reported-by: Марк Коренберг <socketpair@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The wrong of initializer entry was modified.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver's AUTHOR was changed to "Toshiharu Okada" from "Masayuki Ohtake".
I update the Kconfig, renamed "Topcliff" to "EG20T".
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 58933c64(ucc_geth: Fix the wrong the Rx/Tx FIFO size),
the UCC_GETH_UTFTT_INIT is set to 512 based on the recommendation
of the QE Reference Manual. But that will sometimes cause tx halt
while working in half duplex mode.
According to errata draft QE_GENERAL-A003(High Tx Virtual FIFO
threshold size can cause UCC to halt), setting UTFTT less than
[(UTFS x (M - 8)/M) - 128] will prevent this from happening
(M is the minimum buffer size).
The patch changes UTFTT back to 256.
Signed-off-by: Li Yang <leoli@freescale.com>
Cc: Jean-Denis Boyer <jdboyer@media5corp.com>
Cc: Andreas Schmitz <Andreas.Schmitz@riedel.net>
Cc: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet sockets corresponding to passive connections are added to the bind hash
using ___inet_inherit_port(). These sockets are later removed from the bind
hash using __inet_put_port(). These two functions are not exactly symmetrical.
__inet_put_port() decrements hashinfo->bsockets and tb->num_owners, whereas
___inet_inherit_port() does not increment them. This results in both of these
going to -ve values.
This patch fixes this by calling inet_bind_hash() from ___inet_inherit_port(),
which does the right thing.
'bsockets' and 'num_owners' were introduced by commit a9d8f9110d
(inet: Allowing more than 64k connections and heavily optimize bind(0))
Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds some debug information about ehea not being able to
allocate enough spaces. Also it correctly updates the amount of available
skb.
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The HSO driver incorrectly creates a serial device instead of a net
device when disable_net is set. It shouldn't create anything for the
network interface.
Signed-off-by: Filip Aben <f.aben@option.com>
Reported-by: Piotr Isajew <pki@ex.com.pl>
Reported-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We register lapb when tty is created, but unregister it only when the
device is UP. So move the lapb_unregister to x25_asy_close_tty after
the device is down.
The old behaviour causes ldisc switching to fail each second attempt,
because we noted for us that the device is unused, so we use it the
second time, but labp layer still have it registered, so it fails
obviously.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Sergey Lapin <slapin@ossfans.org>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Tested-by: Sergey Lapin <slapin@ossfans.org>
Tested-by: Mikhail Ulyanov <ulyanov.mikhail@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were truncating the number of unicast and multicast MAC addresses
supported. Additionally, we were incorrectly computing the MAC Address
hash (a "1 << N" where we needed a "1ULL << N").
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocating unit from ird might return several error codes
not only -EAGAIN, so it should not be changed and returned
precisely. Same time unit release procedure should be invoked
only if device is unregistering.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A single uninitialized padding byte is leaked to userspace.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
"aup->enable" holds already the address pointing to the MAC enable
register. The bug was introduced by commit d0e7cb:
"au1000-eth: remove volatiles, switch to I/O accessors".
CC: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Wolfgang Grandegger <wg@denx.de>
Acked-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a bug in updating the Greatest Acknowledgment number Received (GAR):
the current implementation does not track the greatest received value -
lower values in the range AWL..AWH (RFC 4340, 7.5.1) erase higher ones.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_win_from_space() does the following:
if (sysctl_tcp_adv_win_scale <= 0)
return space >> (-sysctl_tcp_adv_win_scale);
else
return space - (space >> sysctl_tcp_adv_win_scale);
"space" is int.
As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.
Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
space >> 32 equals space and function returns 0.
Which means we busyloop in tcp_fixup_rcvbuf().
Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].
Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312
Steps to reproduce:
echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
wget www.kernel.org
[softlockup]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The /proc/net/tcp leaks openreq sockets from other namespaces.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the if and else conditional because the code is in mainline and there
is no need in it being there.
Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Incorrect rcu check was used as rcu isn't done
under mutex here. Force check to 1 for now,
to stop it from complaining.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Don't declare variable sized array of iovecs on the stack since this
could cause stack overflow if msg->msgiovlen is large. Instead, coalesce
the user-supplied data into a new buffer and use a single iovec for it.
Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing check for capable(CAP_NET_ADMIN) in SIOCSIFADDR operation.
Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Later parts of econet_sendmsg() rely on saddr != NULL, so return early
with EINVAL if NULL was passed otherwise an oops may occur.
Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Running randconfig with ktest.pl I hit this bug:
[ 16.101158] ICN-ISDN-driver Rev 1.65.6.8 mem=0x000d0000
[ 16.106376] icn: (line0) ICN-2B, port 0x320 added
[ 16.111064] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: c1642880
[ 16.111066]
[ 16.121214] Pid: 1, comm: swapper Not tainted 2.6.37-rc2-test-00124-g6656b3f #8
[ 16.128499] Call Trace:
[ 16.130942] [<c0f51662>] ? printk+0x1d/0x23
[ 16.135200] [<c0f5153f>] panic+0x5c/0x162
[ 16.139286] [<c0d62a9a>] ? icn_addcard+0x6d/0xbe
[ 16.143975] [<c0445783>] print_tainted+0x0/0x8c
[ 16.148582] [<c1642880>] ? icn_init+0xd8/0xdf
[ 16.153012] [<c1642880>] icn_init+0xd8/0xdf
[ 16.157271] [<c04012e5>] do_one_initcall+0x8c/0x143
[ 16.162222] [<c16427a8>] ? icn_init+0x0/0xdf
[ 16.166566] [<c15f1a05>] kernel_init+0x13f/0x1da
[ 16.171256] [<c15f18c6>] ? kernel_init+0x0/0x1da
[ 16.175945] [<c0403bfe>] kernel_thread_helper+0x6/0x10
[ 16.181181] panic occurred, switching back to text console
Looking into it I found that the stack was corrupted by the assignment
of the Rev #. The variable rev is given 10 bytes, and in this output the
characters that were copied was: " 1.65.6.8 $". Which was 11 characters
plus the null ending character for a total of 12 bytes, thus corrupting
the stack.
This patch ups the variable size to 20 bytes as well as changes the
strcpy to strncpy. I also added a check to make sure '$' is found.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vegard Nossum found a unix socket OOM was possible, posting an exploit
program.
My analysis is we can eat all LOWMEM memory before unix_gc() being
called from unix_release_sock(). Moreover, the thread blocked in
unix_gc() can consume huge amount of time to perform cleanup because of
huge working set.
One way to handle this is to have a sensible limit on unix_tot_inflight,
tested from wait_for_unix_gc() and to force a call to unix_gc() if this
limit is hit.
This solves the OOM and also reduce overall latencies, and should not
slowdown normal workloads.
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix unbalanced call to sdio_release_host() on the error path.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add new vendor for Broadcom 4318.
Signed-off-by: Daniel Klaffenbach <danielklaffenbach@gmail.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It seems that using ath9k_hw_stoppcurecv to stop rx dma is not enough.
When it's time to stop DMA, the PCU is still busy, so the rx enable
bit never clears.
Using ath9k_hw_abortpcurecv helps with getting rx stopped much faster,
with this change, I cannot reproduce the rx stop related WARN_ON anymore.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some aspects of PHY initialization are board dependent, things like
indicator LED connections and some clocking modes cannot be determined
by probing. The dev_flags element of struct phy_device can be used to
control these things if an appropriate value can be passed from the
Ethernet driver. We run into problems however if the PHY connections
are specified by the device tree. There is no way for the Ethernet
driver to know what flags it should pass.
If we are using the device tree, the struct phy_device will be
populated with the device tree node corresponding to the PHY, and we
can extract extra configuration information from there.
The next question is what should the format of that information be?
It is highly device specific, and the device tree representation
should not be tied to any arbitrary kernel defined constants. A
straight forward representation is just to specify the exact bits that
should be set using the "marvell,reg-init" property:
phy5: ethernet-phy@5 {
reg = <5>;
compatible = "marvell,88e1149r";
marvell,reg-init =
/* led[0]:1000, led[1]:100, led[2]:10, led[3]:tx */
<3 0x10 0 0x5777>, /* Reg 3,16 <- 0x5777 */
/* mix %:0, led[0123]:drive low off hiZ */
<3 0x11 0 0x00aa>, /* Reg 3,17 <- 0x00aa */
/* default blink periods. */
<3 0x12 0 0x4105>, /* Reg 3,18 <- 0x4105 */
/* led[4]:rx, led[5]:dplx, led[45]:drive low off hiZ */
<3 0x13 0 0x0a60>; /* Reg 3,19 <- 0x0a60 */
};
phy6: ethernet-phy@6 {
reg = <6>;
compatible = "marvell,88e1118";
marvell,reg-init =
/* Fix rx and tx clock transition timing */
<2 0x15 0xffcf 0>, /* Reg 2,21 Clear bits 4, 5 */
/* Adjust LED drive. */
<3 0x11 0 0x442a>, /* Reg 3,17 <- 0442a */
/* irq, blink-activity, blink-link */
<3 0x10 0 0x0242>; /* Reg 3,16 <- 0x0242 */
};
The Marvell PHYs have a page select register at register 22 (0x16), we
can specify any register by its page and register number. These are
the first and second word. The third word contains a mask to be ANDed
with the existing register value, and the fourth word is ORed with the
result to yield the new register value. The new marvell_of_reg_init
function leaves the page select register unchanged, so a call to it
can be dropped into the .config_init functions without unduly
affecting the state of the PHY.
If CONFIG_OF_MDIO is not set, there is no of_node, or no
"marvell,reg-init" property, the PHY initialization is unchanged.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Cyril Chemparathy <cyril@ti.com>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 88E1149R is 10/100/1000 quad-gigabit Ethernet PHY. The
.config_aneg function can be shared with 88E1118, but it needs its own
.config_init.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Cyril Chemparathy <cyril@ti.com>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The definition of the Marvell PHY page register is not specific to
88E1121, so rename the macro to MII_MARVELL_PHY_PAGE, and use it
throughout.
Suggested-by: Cyril Chemparathy <cyril@ti.com>
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: Cyril Chemparathy <cyril@ti.com>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver appears to be mistaking the permission field with default value
in the case of debug and qlge_irq_type.
Driver is also passing debug as a bitmask into netif_msg_init()
which wants a number of bits. Ron Mercer suggests we should
change this to pass in -1 so the defaults get used instead,
which makes the default much less verbose.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ref count bug introduced by
commit 2de7957072
Author: Lorenzo Colitti <lorenzo@google.com>
Date: Wed Oct 27 18:16:49 2010 +0000
ipv6: addrconf: don't remove address state on ifdown if the address
is being kept
Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
refcnt correctly with in6_ifa_put().
Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It looks to me as if the second value of rate_err_array is intended
to be a decimal 625. However, with a leading 0 it becomes an octal
constant, and as such evaluates to a decimal 405.
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 496c185c94 "atl1c: Add support
for Atheros AR8152 and AR8152" added the condition:
if (hw->nic_type == athr_l1c || hw->nic_type == athr_l2c_b)
for enabling OTP CLK, and the condition:
if (hw->nic_type == athr_l1c || hw->nic_type == athr_l2c)
for disabling OTP CLK. Since the two previously defined hardware
types are athr_l1c and athr_l2c, the latter condition appears to be
the correct one. Change the former to match.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.
In ceph, add the missing flag.
In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is
cleaner and allows using HIGHMEM pages as well.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VMWare reports that the e1000 driver has a bug when bringing down the
interface, such that interrupts are not disabled in the hardware but the
driver stops reporting that it consumed the interrupt.
The fix is to set the driver's "down" flag later in the routine,
after all the timers and such have exited, preventing the interrupt
handler from being called and exiting early without handling the
interrupt.
CC: Anupam Chanda <anupamc@vmware.com>
CC: stable kernel <stable@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel-doc warning for sk_filter_rcu_release():
Warning(net/core/filter.c:586): missing initial short description on line:
* sk_filter_rcu_release: Release a socket filter by rcu_head
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Since interrupts are enabled only when open is called on the interface,
Attempting a firmware update operation when interface is down could lead to
partial success or failure of operation. This fix fails the request if
netif_running is false.
Signed-off-by: Sarveshwar Bandi <Sarveshwar.Bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When NF_CONNTRACK is enabled, IP_VS uses conntrack symbols.
Therefore IP_VS can't be linked statically when conntrack
is built modular.
Reported-by: Justin P. Mattock <justinmattock@gmail.com>
Tested-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
irttp_data_request() returns meaningful errorcodes, while irttp_udata_request()
just returns -1 in similar situations. Sync the two and the loglevels of the
accompanying output.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose reachable and retrans timer values in msecs instead of jiffies.
Both timer values are already exposed as msecs in the neighbour table
netlink interface.
The creation timestamp format with increased precision is kept but
cleaned up.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFLA_PROTINFO exposes timer related per device settings in jiffies.
Change it to expose these values in msecs like the sysctl interface
does.
I did not find any users of IFLA_PROTINFO which rely on any of these
values and even if there are, they are likely already broken because
there is no way for them to reliably convert such a value to another
time format.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
VORTEX_PCI() could return NULL so it needs to be casted before
accessing any member of struct pci_dev. This fixes following
build failure. Likewise VORTEX_EISA() was changed also.
CC [M] drivers/net/3c59x.o
drivers/net/3c59x.c: In function 'acpi_set_WOL':
drivers/net/3c59x.c:3211:39: warning: dereferencing 'void *' pointer
drivers/net/3c59x.c:3211:39: error: request for member 'current_state' in something not a structure or union
make[3]: *** [drivers/net/3c59x.o] Error 1
make[2]: *** [drivers/net/3c59x.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipg.c:
The id [SUNDANCE, 0x1021] (=[0x13f0, 0x1021]) is defined
at dl2k.h and ipg.c.
But this device works better with dl2k driver.
This problem is similar with the commit
[25cca53527
ipg: Remove device claimed by dl2k from pci id table]
at 11 Feb 2010.
Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc_netdev() is not checked here for NULL return value. dev is
check instead. It might lead to NULL dereference of ndev.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting tid information in the TX header is required only for QoS
frames. Not handling this case causes severe data loss with some APs.
Cc: stable@kernel.org
Signed-off-by: Rajkumar Manoharan <rmanoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>