Commit Graph

400326 Commits

Author SHA1 Message Date
Andi Kleen
5843ef4213 tcp: Always set options to 0 before calling tcp_established_options
tcp_established_options assumes opts->options is 0 before calling,
as it read modify writes it.

For the tcp_current_mss() case the opts structure is not zeroed,
so this can be done with uninitialized values.

This is ok, because ->options is not read in this path.
But it's still better to avoid the operation on the uninitialized
field. This shuts up a static code analyzer, and presumably
may help the optimizer.

Cc: netdev@vger.kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:32:43 -04:00
Andi Kleen
58e4e1f6ca igb: Avoid uninitialized advertised variable in eee_set_cur
eee_get_cur assumes that the output data is already zeroed. It can
read-modify-write the advertised field:

              if (ipcnfg & E1000_IPCNFG_EEE_100M_AN)
2594			edata->advertised |= ADVERTISED_100baseT_Full;

This is ok for the normal ethtool eee_get call, which always
zeroes the input data before.

But eee_set_cur also calls eee_get_cur and it did not zero the input
field. Later on it then compares agsinst the field, which can contain partial
stack garbage.

Zero the input field in eee_set_cur() too.

Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:31:48 -04:00
David S. Miller
52f77ba925 Merge branch 'calxedaxgmac'
Rob Herring says:

====================
This is a couple of fixes related to xgmac_set_rx_mode. The changes are
necessary for "bridge fdb add" to work correctly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:10:45 -04:00
Rob Herring
0cf2f38007 net: calxedaxgmac: determine number of address filters at runtime
Highbank and Midway xgmac h/w have different number of MAC address filter
registers with 7 and 31, respectively. Highbank has been wrong, so fix it
and detect the number of filter registers at run-time. Unfortunately,
the version register is the same on both SOCs, so simply test if write to
the last filter register will take a value. It always reads as 0 if not.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:10:29 -04:00
Rob Herring
8a8c3f5beb net: calxedaxgmac: add uc and mc filter addresses in promiscuous mode
Even in promiscuous mode, we need to add filter addresses for correct
operation. This fixes silent failures when using a bridge and adding
addresses using the "bridge fdb add" command.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:10:29 -04:00
Rob Herring
8c1c58ec70 net: calxedaxgmac: fix clearing of old filter addresses
In commit 2ee68f621a (net: calxedaxgmac: fix various errors in
xgmac_set_rx_mode), a fix to clean-up old address entries was added.
However, the loop to zero out the entries failed to increment the register
address resulting in only 1 entry getting cleared. Fix this to correctly
use the loop index. Also, the end of the loop condition was off by 1 and
should have been <= rather than <.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:10:29 -04:00
Mathias Krause
6865d1e834 unix_diag: fix info leak
When filling the netlink message we miss to wipe the pad field,
therefore leak one byte of heap memory to userland. Fix this by
setting pad to 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:08:24 -04:00
David S. Miller
d022af2790 Merge branch 'connector'
Mathias Krause says:

====================
This series fixes a few netlink related issues of the connector interface.

The first two patches are bug fixes. The last two are cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:03:56 -04:00
Mathias Krause
05742faf25 connector - documentation: simplify netlink message length assignment
Use the precalculated size instead of obfuscating the message length
calculation by first subtracting the netlink header length from size
and then use the NLMSG_LENGTH() macro to add it back again.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:03:51 -04:00
Mathias Krause
ac73bf50b7 connector: use 'size' everywhere in cn_netlink_send()
We calculated the size for the netlink message buffer as size. Use size
in the memcpy() call as well instead of recalculating it.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:03:50 -04:00
Mathias Krause
162b2bedc0 connector: use nlmsg_len() to check message length
The current code tests the length of the whole netlink message to be
at least as long to fit a cn_msg. This is wrong as nlmsg_len includes
the length of the netlink message header. Use nlmsg_len() instead to
fix this "off-by-NLMSG_HDRLEN" size check.

Cc: stable@vger.kernel.org  # v2.6.14+
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:03:50 -04:00
Mathias Krause
e727ca82e0 proc connector: fix info leaks
Initialize event_data for all possible message types to prevent leaking
kernel stack contents to userland (up to 20 bytes). Also set the flags
member of the connector message to 0 to prevent leaking two more stack
bytes this way.

Cc: stable@vger.kernel.org  # v2.6.15+
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02 16:03:50 -04:00
Linus Torvalds
c31eeaced2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking changes from David Miller:

 1) Multiply in netfilter IPVS can overflow when calculating destination
    weight.  From Simon Kirby.

 2) Use after free fixes in IPVS from Julian Anastasov.

 3) SFC driver bug fixes from Daniel Pieczko.

 4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.

 5) Locking and encapsulation fixes to serial line CAN driver, from
    Andrew Naujoks.

 6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
    Eilon Greenstein, and Ariel Elior.

 7) In lapb, if no other packets are outstanding, T1 timeouts actually
    stall things and no packet gets sent.  Fix from Josselin Costanzi.

 8) ICMP redirects should not make it to the socket error queues, from
    Duan Jiong.

 9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.

10) Fix setting of VLAN priority field on via-rhine driver, from Roget
    Luethi.

11) Fix TX stalls and VLAN promisc programming in be2net driver from
    Ajit Khaparde.

12) Packet padding doesn't get handled correctly in new usbnet SG
    support code, from Ming Lei.

13) Fix races in netdevice teardown wrt.  network namespace closing.
    From Eric W.  Biederman.

14) Fix potential missed initialization of net_secret if not TCP
    connections are openned.  From Eric Dumazet.

15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
    Aleksander Morgado.

16) skb_cow_head() can change skb->data and thus packet header pointers,
    don't use stale ip_hdr reference in ip_tunnel code.

17) Backend state transition handling fixes in xen-netback, from Paul
    Durrant.

18) Packet offset for AH protocol is handled wrong in flow dissector,
    from Eric Dumazet.

19) Taking down an fq packet scheduler instance can leave stale packets
    in the queues, fix from Eric Dumazet.

20) Fix performance regressions introduced by TCP Small Queues.  From
    Eric Dumazet.

21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
    Hannes Frederic Sowa.

22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
    reference to the ipv4/ipv6 specific network device state, so use the
    reference put that will check and release the object if the
    reference hits zero.  From Salam Noureddine.

23) Fix memory corruption in ip_tunnel driver, and use skb_push()
    instead of __skb_push() so that similar bugs are less hard to find.
    From Steffen Klassert.

24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
    Nicolas Dichtel.

25) fq scheduler doesn't accurately rate limit in certain circumstances,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
  pkt_sched: fq: rate limiting improvements
  ip6tnl: allow to use rtnl ops on fb tunnel
  sit: allow to use rtnl ops on fb tunnel
  ip_tunnel: Remove double unregister of the fallback device
  ip_tunnel_core: Change __skb_push back to skb_push
  ip_tunnel: Add fallback tunnels to the hash lists
  ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
  qlcnic: Fix SR-IOV configuration
  ll_temac: Reset dma descriptors indexes on ndo_open
  skbuff: size of hole is wrong in a comment
  ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
  ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
  ethernet: moxa: fix incorrect placement of __initdata tag
  ipv6: gre: correct calculation of max_headroom
  powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
  Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
  bonding: Fix broken promiscuity reference counting issue
  tcp: TSQ can use a dynamic limit
  dm9601: fix IFF_ALLMULTI handling
  pkt_sched: fq: qdisc dismantle fixes
  ...
2013-10-01 12:58:48 -07:00
Linus Torvalds
0b936842c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fix from David Miller:
 "Just a single bug fix to a regression added during some strlcpy()
  conversions"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix buggy strlcpy() conversion in ldom_reboot().
2013-10-01 12:57:59 -07:00
Linus Torvalds
517bf8fc21 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs lru leak fix from Al Viro:
 "The fix in "super: fix for destroy lrus" didn't - they need to be
  destroyed, all right, but that's the wrong place..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/super.c: fix lru_list leak for real
2013-10-01 10:28:11 -07:00
Linus Torvalds
77c4ad8e23 Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull two KVM fixes from Gleb Natapov.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined
  ARM: kvm: rename cpu_reset to avoid name clash
2013-10-01 10:25:10 -07:00
Al Viro
c2d22ecd3c fs/super.c: fix lru_list leak for real
Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong;
the right place is destroy_super().  As it is, we leak them if sget()
decides that new superblock it has allocated (and never shown to
anybody) isn't needed and should be freed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-01 13:11:21 -04:00
Eric Dumazet
0eab5eb7a3 pkt_sched: fq: rate limiting improvements
FQ rate limiting suffers from two problems, reported
by Steinar :

1) FQ enforces a delay when flow quantum is exhausted in order
to reduce cpu overhead. But if packets are small, current
delay computation is slightly wrong, and observed rates can
be too high.

Steinar had this problem because he disabled TSO and GSO,
and default FQ quantum is 2*1514.

(Of course, I wish recent TSO auto sizing changes will help
to not having to disable TSO in the first place)

2) maxrate was not used for forwarded flows (skbs not attached
to a socket)

Tested:

tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
netperf -H lpq84 -l 1000 &
sleep 10 ; tc -s qdisc show dev eth0
qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
 quantum 3028 initial_quantum 15140 maxrate 8000Kbit
 Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0)
 rate 7831Kbit 653pps backlog 7570b 5p requeues 0
  44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns
  0 gc, 0 highprio, 5545 throttled

lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457 <nop,nop,timestamp 961812 572609068>
09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457 <nop,nop,timestamp 961812 572609068>
09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384 <nop,nop,timestamp 572609080 961812>
09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384 <nop,nop,timestamp 572609083 961815>
09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384 <nop,nop,timestamp 572609086 961818>
09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384 <nop,nop,timestamp 572609090 961821>

Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 13:00:38 -04:00
Nicolas Dichtel
bb8140947a ip6tnl: allow to use rtnl ops on fb tunnel
rtnl ops where introduced by c075b13098 ("ip6tnl: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.

Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 0bd8762824 ("ip6tnl: add x-netns support")).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:55:53 -04:00
Nicolas Dichtel
205983c437 sit: allow to use rtnl ops on fb tunnel
rtnl ops where introduced by ba3e3f50a0 ("sit: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.

Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 5e6700b3bf ("sit: add support of x-netns")).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:55:53 -04:00
David S. Miller
9cb1712468 Merge branch 'ip_tunnel'
ip_tunnel bug fixes from Steffen Klassert.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:42:28 -04:00
Steffen Klassert
cfe4a53692 ip_tunnel: Remove double unregister of the fallback device
When queueing the netdevices for removal, we queue the
fallback device twice in ip_tunnel_destroy(). The first
time when we queue all netdevices in the namespace and
then again explicitly. Fix this by removing the explicit
queueing of the fallback device.

Bug was introduced when network namespace support was added
with commit 6c742e714d ("ipip: add x-netns support").

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:42:16 -04:00
Steffen Klassert
78a3694d44 ip_tunnel_core: Change __skb_push back to skb_push
Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()")
moved the IP header installation to iptunnel_xmit() and
changed skb_push() to __skb_push(). This makes possible
bugs hard to track down, so change it back to skb_push().

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:42:16 -04:00
Steffen Klassert
6701328262 ip_tunnel: Add fallback tunnels to the hash lists
Currently we can not update the tunnel parameters of
the fallback tunnels because we don't find them in the
hash lists. Fix this by adding them on initialization.

Bug was introduced with commit c544193214
("GRE: Refactor GRE tunneling code.")

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:42:16 -04:00
Steffen Klassert
3e08f4a72f ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
We might extend the used aera of a skb beyond the total
headroom when we install the ipip header. Fix this by
calling skb_cow_head() unconditionally.

Bug was introduced with commit c544193214
("GRE: Refactor GRE tunneling code.")

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:42:16 -04:00
David S. Miller
e024bdc051 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS fixes for your net
tree, they are:

* Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
  Patrick McHardy.

* Fix possible weight overflow in lblc and lblcr schedulers due to
  32-bits arithmetics, from Simon Kirby.

* Fix possible memory access race in the lblc and lblcr schedulers,
  introduced when it was converted to use RCU, two patches from
  Julian Anastasov.

* Fix hard dependency on CPU 0 when reading per-cpu stats in the
  rate estimator, from Julian Anastasov.

* Fix race that may lead to object use after release, when invoking
  ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
  Anastasov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:39:35 -04:00
Manish Chopra
1ed98ed55d qlcnic: Fix SR-IOV configuration
o Interface needs to be brought down and up while configuring SR-IOV.
  Protect interface up/down using rtnl_lock()/rtnl_unlock()

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:34:59 -04:00
Ricardo Ribalda
7167cf0e8c ll_temac: Reset dma descriptors indexes on ndo_open
The dma descriptors indexes are only initialized on the probe function.

If a packet is on the buffer when temac_stop is called, the dma
descriptors indexes can be left on a incorrect state where no other
package can be sent.

So an interface could be left in an usable state after ifdow/ifup.

This patch makes sure that the descriptors indexes are in a proper
status when the device is open.

Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:31:35 -04:00
Nicolas Dichtel
4590672357 skbuff: size of hole is wrong in a comment
Since commit c93bdd0e03 ("netvm: allow skb allocation to use PFMEMALLOC
reserves"), hole size is one bit less than what is written in the comment.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:32:39 -07:00
David S. Miller
d9a71f97d5 Merge branch 'fixes-for-3.12' of git://gitorious.org/linux-can/linux-can
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:31:05 -07:00
Salam Noureddine
9260d3e101 ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
It is possible for the timer handlers to run after the call to
ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
handler function in order to do proper cleanup when the refcnt
reaches 0. Otherwise, the refcnt can reach zero without the
inet6_dev being destroyed and we end up leaking a reference to
the net_device and see messages like the following,

unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Tested on linux-3.4.43.

Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:28:58 -07:00
Salam Noureddine
e2401654dd ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
It is possible for the timer handlers to run after the call to
ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
function in order to do proper cleanup when the refcnt reaches 0.
Otherwise, the refcnt can reach zero without the in_device being
destroyed and we end up leaking a reference to the net_device and
see messages like the following,

unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Tested on linux-3.4.43.

Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:28:56 -07:00
Bartlomiej Zolnierkiewicz
437a3ae1d0 ethernet: moxa: fix incorrect placement of __initdata tag
__initdata tag should be placed between the variable name and equal
sign for the variable to be placed in the intended .init.data section.

In this particular case __initdata is incorrect as moxart_mac_driver
can be used after the driver gets initialized.

Also while at it static-ize moxart_mac_driver.

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:16:43 -07:00
Hannes Frederic Sowa
3da812d860 ipv6: gre: correct calculation of max_headroom
gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header,
so initialize max_headroom to zero. Otherwise the

	if (encap_limit >= 0) {
		max_headroom += 8;
		mtu -= 8;
	}

increments an uninitialized variable before max_headroom was reset.

Found with coverity: 728539

Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 22:04:09 -07:00
Aida Mynzhasova
e58f6f4fb4 powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
Currently IEEE 1588 timer reference clock source is determined through
hard-coded value in gianfar_ptp driver. This patch allows to select ptp
clock source by means of device tree file node.

For instance:

	fsl,cksel = <0>;

for using external (TSEC_TMR_CLK input) high precision timer
reference clock.

Other acceptable values:

	<1> : eTSEC system clock
	<2> : eTSEC1 transmit clock
	<3> : RTC clock input

When this attribute isn't used, eTSEC system clock will serve as
IEEE 1588 timer reference clock.

Signed-off-by: Aida Mynzhasova <aida.mynzhasova@skitlab.ru>
Acked-by: Kumar Gala <galak@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 21:17:16 -07:00
David S. Miller
3f3f0960af Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
This reverts commit 894116bd0e.

I applied the wrong version of this patch, correct
version coming up.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 21:16:17 -07:00
Neil Horman
5a0068deb6 bonding: Fix broken promiscuity reference counting issue
Recently grabbed this report:
https://bugzilla.redhat.com/show_bug.cgi?id=1005567

Of an issue in which the bonding driver, with an attached vlan encountered the
following errors when bond0 was taken down and back up:

dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of
device might be broken.

The error occurs because, during __bond_release_one, if we release our last
slave, we take on a random mac address and issue a NETDEV_CHANGEADDR
notification.  With an attached vlan, the vlan may see that the vlan and bond
mac address were in sync, but no longer are.  This triggers a call to dev_uc_add
and dev_set_rx_mode, which enables IFF_PROMISC on the bond device.  Then, when
we complete __bond_release_one, we use the current state of the bond flags to
determine if we should decrement the promiscuity of the releasing slave.  But
since the bond changed promiscuity state during the release operation, we
incorrectly decrement the slave promisc count when it wasn't in promiscuous mode
to begin with, causing the above error

Fix is pretty simple, just cache the bonding flags at the start of the function
and use those when determining the need to set promiscuity.

This is also needed for the ALLMULTI flag

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Mark Wu <wudxw@linux.vnet.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
Reported-by: Mark Wu <wudxw@linux.vnet.ibm.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 21:10:55 -07:00
Eric Dumazet
c9eeec26e3 tcp: TSQ can use a dynamic limit
When TCP Small Queues was added, we used a sysctl to limit amount of
packets queues on Qdisc/device queues for a given TCP flow.

Problem is this limit is either too big for low rates, or too small
for high rates.

Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
auto sizing, it can better control number of packets in Qdisc/device
queues.

New limit is two packets or at least 1 to 2 ms worth of packets.

Low rates flows benefit from this patch by having even smaller
number of packets in queues, allowing for faster recovery,
better RTT estimations.

High rates flows benefit from this patch by allowing more than 2 packets
in flight as we had reports this was a limiting factor to reach line
rate. [ In particular if TX completion is delayed because of coalescing
parameters ]

Example for a single flow on 10Gbp link controlled by FQ/pacing

14 packets in flight instead of 2

$ tc -s -d qd
qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
buckets 1024 quantum 3028 initial_quantum 15140
 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
requeues 6822476)
 rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
  2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
  2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit

Note that sk_pacing_rate is currently set to twice the actual rate, but
this might be refined in the future when a flow is in congestion
avoidance.

Additional change : skb->destructor should be set to tcp_wfree().

A future patch (for linux 3.13+) might remove tcp_limit_output_bytes

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 20:41:57 -07:00
Linus Torvalds
f927318840 NFS client bugfixes for 3.12
- Stable fix for Oopses in the pNFS files layout driver
 - Fix a regression when doing a non-exclusive file create on NFSv4.x
 - NFSv4.1 security negotiation fixes when looking up the root filesystem
 - Fix a memory ordering issue in the pNFS files layout driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSSfNNAAoJEGcL54qWCgDybGYQAJGm4/vd7/rWZ49KIjGFGkFo
 sCt0UOK6Y6ALhUOIlIreXsQ+Iwn9aAoIIRgx8UwnB+hO6PGnSyFuJZZx1KE8V2kj
 6JlE5FbsWV+3uFQzNJQsNcoj7NZMzIRZT7x+7QansBOdSQjgQc3ig2sAMWREZjn8
 GxMOl8FNRrnP8gRom30ZScgMp1YDM8J1ql80S/nbxh2NOLBsvgg9VapzJhhqkMyl
 b7WKX4Qbg4AeSaxIAIrIwcZ7L2YS09JGC40VSybQARs0/7J8fjOZPs7CmrUCoB5F
 DmT5vfEC4+dqDf8PMyoFVfxK5ua5Sb/FGQmagYYa8bSgY7Uq03akYI++co+4PZU1
 f3SN6CSvVffzGMdXAhUupOZQbkKvKFxR2MTGy8s7dxdkQudd4RioYPDmLfCHlbmb
 VY5kFh/Duqso1FCrcfvZoC88ElrWUz5yoVzZyECOEwCs1wjI6bjmGdSqCSbU75Lm
 Z0XOAn1cStwFvGwCbGZPUzlvueji3coDdCFPBXAOFHzisLYoo/Lxenw7l5D1qM5b
 02iZllcIo340vw8wxHZxVebecFo33P90X1gjv0HQQkV/6EeNgq4D47SWTPxRq3Ai
 Dl9MFjTPl51oseDLrH6I/hBvcqjksB1M1+WjifT0bCIi3Y0HAea2U0wgweHS3vAd
 QHqIpIJxNHDjPBMDWEZW
 =ScfI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Stable fix for Oopses in the pNFS files layout driver
 - Fix a regression when doing a non-exclusive file create on NFSv4.x
 - NFSv4.1 security negotiation fixes when looking up the root
   filesystem
 - Fix a memory ordering issue in the pNFS files layout driver

* tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Give "flavor" an initial value to fix a compile warning
  NFSv4.1: try SECINFO_NO_NAME flavs until one works
  NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
  NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
  NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method
2013-09-30 17:10:26 -07:00
Peter Korsgaard
bf0ea63807 dm9601: fix IFF_ALLMULTI handling
Pass-all-multicast is controlled by bit 3 in RX control, not bit 2
(pass undersized frames).

Reported-by: Joseph Chang <joseph_chang@davicom.com.tw>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 19:48:59 -04:00
Linus Torvalds
522d6d38f8 Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
  pidns: fix free_pid() to handle the first fork failure
  ipc,msg: prevent race with rmid in msgsnd,msgrcv
  ipc/sem.c: update sem_otime for all operations
  mm/hwpoison: fix the lack of one reference count against poisoned page
  mm/hwpoison: fix false report on 2nd attempt at page recovery
  mm/hwpoison: fix test for a transparent huge page
  mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
  block: change config option name for cmdline partition parsing
  mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
  mm: avoid reinserting isolated balloon pages into LRU lists
  arch/parisc/mm/fault.c: fix uninitialized variable usage
  include/asm-generic/vtime.h: avoid zero-length file
  nilfs2: fix issue with race condition of competition between segments for dirty blocks
  Documentation/kernel-parameters.txt: replace kernelcore with Movable
  mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
  kernel/kmod.c: check for NULL in call_usermodehelper_exec()
  ipc/sem.c: synchronize the proc interface
  ipc/sem.c: optimize sem_lock()
  ipc/sem.c: fix race in sem_lock()
  mm/compaction.c: periodically schedule when freeing pages
  ...
2013-09-30 14:32:32 -07:00
Oleg Nesterov
314a8ad0f1 pidns: fix free_pid() to handle the first fork failure
"case 0" in free_pid() assumes that disable_pid_allocation() should
clear PIDNS_HASH_ADDING before the last pid goes away.

However this doesn't happen if the first fork() fails to create the
child reaper which should call disable_pid_allocation().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:03 -07:00
Davidlohr Bueso
4271b05a22 ipc,msg: prevent race with rmid in msgsnd,msgrcv
This fixes a race in both msgrcv() and msgsnd() between finding the msg
and actually dealing with the queue, as another thread can delete shmid
underneath us if we are preempted before acquiring the
kern_ipc_perm.lock.

Manfred illustrates this nicely:

Assume a preemptible kernel that is preempted just after

    msq = msq_obtain_object_check(ns, msqid)

in do_msgrcv().  The only lock that is held is rcu_read_lock().

Now the other thread processes IPC_RMID.  When the first task is
resumed, then it will happily wait for messages on a deleted queue.

Fix this by checking for if the queue has been deleted after taking the
lock.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reported-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: <stable@vger.kernel.org> 	[3.11]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:03 -07:00
Manfred Spraul
0e8c665699 ipc/sem.c: update sem_otime for all operations
In commit 0a2b9d4c79 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).

But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.

The fix is simple:
Non-alter operations must update sem_otime.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Jia He <jiakernel@gmail.com>
Tested-by: Jia He <jiakernel@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:03 -07:00
Wanpeng Li
fb31ba30fb mm/hwpoison: fix the lack of one reference count against poisoned page
The lack of one reference count against poisoned page for hwpoison_inject
w/o hwpoison_filter enabled result in hwpoison detect -1 users still
referenced the page, however, the number should be 0 except the poison
handler held one after successfully unmap.  This patch fix it by hold one
referenced count against poisoned page for hwpoison_inject w/ and w/o
hwpoison_filter enabled.

Before patch:

[   71.902112] Injecting memory failure at pfn 224706
[   71.902137] MCE 0x224706: dirty LRU page recovery: Failed
[   71.902138] MCE 0x224706: dirty LRU page still referenced by -1 users

After patch:

[   94.710860] Injecting memory failure at pfn 215b68
[   94.710885] MCE 0x215b68: dirty LRU page recovery: Recovered

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:03 -07:00
Wanpeng Li
2d421acd15 mm/hwpoison: fix false report on 2nd attempt at page recovery
If the page is poisoned by software injection w/ MF_COUNT_INCREASED
flag, there is a false report during the 2nd attempt at page recovery
which is not truthful.

This patch fixes it by reporting the first attempt to try free buddy
page recovery if MF_COUNT_INCREASED is set.

Before patch:

[  346.332041] Injecting memory failure at pfn 200010
[  346.332189] MCE 0x200010: free buddy, 2nd try page recovery: Delayed

After patch:

[  297.742600] Injecting memory failure at pfn 200010
[  297.742941] MCE 0x200010: free buddy page recovery: Delayed

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:02 -07:00
Wanpeng Li
e76d30e20b mm/hwpoison: fix test for a transparent huge page
PageTransHuge() can't guarantee the page is a transparent huge page
since it returns true for both transparent huge and hugetlbfs pages.

This patch fixes it by checking the page is also !hugetlbfs page.

Before patch:

[  121.571128] Injecting memory failure at pfn 23a200
[  121.571141] MCE 0x23a200: huge page recovery: Delayed
[  140.355100] MCE: Memory failure is now running on 0x23a200

After patch:

[   94.290793] Injecting memory failure at pfn 23a000
[   94.290800] MCE 0x23a000: huge page recovery: Delayed
[  105.722303] MCE: Software-unpoisoned page 0x23a000

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:02 -07:00
Wanpeng Li
20cb6cab52 mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
madvise_hwpoison won't check if the page is small page or huge page and
traverses in small page granularity against the range unconditionally,
which result in a printk flood "MCE xxx: already hardware poisoned" if
the page is a huge page.

This patch fixes it by using compound_order(compound_head(page)) for
huge page iterator.

Testcase:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>

#define PAGES_TO_TEST 3
#define PAGE_SIZE	4096 * 512

int main(void)
{
	char *mem;
	int i;

	mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
			PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0);

	if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
		return -1;

	munmap(mem, PAGES_TO_TEST * PAGE_SIZE);

	return 0;
}

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:02 -07:00
Paul Gortmaker
080506ad0a block: change config option name for cmdline partition parsing
Recently commit bab55417b1 ("block: support embedded device command
line partition") introduced CONFIG_CMDLINE_PARSER.  However, that name
is too generic and sounds like it enables/disables generic kernel boot
arg processing, when it really is block specific.

Before this option becomes a part of a full/final release, add the BLK_
prefix to it so that it is clear in absence of any other context that it
is block specific.

In addition, fix up the following less critical items:
 - help text was not really at all helpful.
 - index file for Documentation was not updated
 - add the new arg to Documentation/kernel-parameters.txt
 - clarify wording in source comments

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Cai Zhiyong <caizhiyong@huawei.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:02 -07:00
Vlastimil Babka
eadb41ae82 mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
The function __munlock_pagevec_fill() introduced in commit 7a8010cd36
("mm: munlock: manual pte walk in fast path instead of
follow_page_mask()") uses pmd_addr_end() for restricting its operation
within current page table.

This is insufficient on architectures/configurations where pmd is folded
and pmd_addr_end() just returns the end of the full range to be walked.
In this case, it allows pte++ to walk off the end of a page table
resulting in unpredictable behaviour.

This patch fixes the function by using pgd_addr_end() and pud_addr_end()
before pmd_addr_end(), which will yield correct page table boundary on
all configurations.  This is similar to what existing page walkers do
when walking each level of the page table.

Additionaly, the patch clarifies a comment for get_locked_pte() call in the
function.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Cc: Jörn Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30 14:31:02 -07:00