Pull x86 fixes from Ingo Molnar.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, kvm: Call restore_sched_clock_state() only after %gs is initialized
x86: Use -mno-avx when available
x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility
x86: Preserve lazy irq disable semantics in fixup_irqs()
Because of the inclusion of skeleton.dtsi, the memory node is
named "memory" we where not modifying the already included one
but creating a new one. It caused bad memory node detection during
early_init_dt_scan_memory() so we modify them.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: devicetree-discuss@lists.ozlabs.org
SPI chip select pins have to be checked by gpio_is_valid().
The USB host overcurrent_pin checking was missing.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Compilation error in case of non-DT configuration without this
of.h header file.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Add comments to NAND "gpios" property to make it clearer.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Change number of ports to 3 for newer SoCs. Modify pdata structure
and ohci-at91 code that was dealing with ports information and check
of port indexes.
Several coding style errors have been addresses as the patch was touching
affected lines of code and was producing errors while run through
checkpatch.pl.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
The DT information are filled in a pdata structure and then passed on
to the usual check code of the probe function. Thus we do not need to
redo the gpio checking and irq configuration in the DT-related code.
On the other hand, we setup GPIO direction in driver for vbus and
overcurrent. It will be useful when moving to pinctrl subsystem.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Change vbus gpio configuration in .dts files to switch to
active low configuration.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: stable <stable@vger.kernel.org>
Due to an error while handling vbus_pin_active_low in ohci-at91 driver,
the specification of this property was not good in devices/board files.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Cc: stable <stable@vger.kernel.org> [3.2+]
The information is not properly taken into account
for {get|set}_power() functions.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org> [3.2+]
addruart cannot read from the physical address of the chipid
register, that will fail as soon as the mmu is turned on.
Fixing it to read from the physical or virtual address depending
on the mmu state also does not work, because there is a period
between head.S and exynos_map_io where the mmu is on, the uart
is mapped and used, but the chipid mapping is not yet present.
Fix addruart to use the ARM Main ID cp15 register to determine
if the core is Cortex A15 (EXYNOS5) or not (EXYNOS4).
Signed-off-by: Colin Cross <ccross@android.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Thomas Abraham <thomas.abraham@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPfFQCAAoJEBLB8Bhh3lVKnMAP/je7vKTEJeQkFikqwmWJzssc
xMHRXLEgrh86RQ5BxHOuiTemKIM6BtwylZQHSXl+idOkPVrHBwQ1xdRuB5SXF0Fc
gBFYZXL9RUKh7wQYwvcIibQtbF75kJYqU5GlrIt5l6I8qY6xNuCRjuSVtBnw4v7j
JVHpGDAzASaGX8KMqijE97JrQOp/08/+C0a+p0Xj2GmaJp7vsgfGAj6vbsOCk7xA
dWpTh3wQnhM0g+i8+N2o3+DAgrJ6PQo3JUgBWS23u7bk4DU2Gj7LlPKhiCGu3PdP
YRNuX+nDkNJs0I6KK8m0W+9u8nEzmPXp7DM40162FFXdIzlwYxMK9zUM9eXxr1yY
wBFhQ3H9tS32gJzsNORkp8vDLFVqLV+uEhutDS3EaWneg7wwCrU6E3NRD1PyO7Me
MzsWX/U1DFVZgG2/yOTKD9sJWtQ6g/D9NFUO2q6hGkk1O+kknGoAKpIXm/ok20oC
kWNexfVFPnagixvntdeSQQvy9SEyEL+bVf8OMfv115eOeHsJTg85eyvOixPm4gLE
MWAwkZ3hnLcL9Ugv45KXHWR3lgHLbyGk11oGC2Nu2WlRLdLnJpmdfQO/4QFtfSwL
gUL23f6nWmUcJTWFAS+gP2iFALc6HvWHb6w9Jn4ZKZw5AjDCNLqpoQRyCG18h2Du
hlfnPEiB2DGwbb0yZw+M
=2kpn
-----END PGP SIGNATURE-----
Merge tag 'mce-fix-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull MCE fixlet from Borislav Petkov:
"One fix which makes MCE decoding much more "liberal" wrt families."
* tag 'mce-fix-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
MCE, AMD: Drop too granulary family model checks
Rename EMAC clocks to match driver expectations: both davinci_emac and
davinci_mdio drivers call clk_get(dev, NULL) so we have to provide
("davinci_emac", NULL) and ("davinci_mdio.0", NULL) clocks instead of
("davinci_emac", "emac_clk") and ("davinci_emac", "phy_clk") resp.
Cc: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
Tested-by: Matt Porter <mporter@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
- some RAID levels didn't clear up properly if md_integrity_register
failed
- a 'check' of RAID5/RAID6 doesn't actually read any data
since a recent patch - so fix that (and mark for -stable)
- a couple of other minor bugs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIVAwUAT3qOCznsnt1WYoG5AQK/JA//dHuk1IbB2KglxB6Cb6wDtoE9wqBWuN82
W+Squ6u8e2iwwBSaKbuy+0+qb/ew20iRAHbHXt6LlNqRZCWbWuyBvjxxX0DnpVgF
eRJajccy86BfHpsE7R794FHohBDW1dAlUSFXfF4o9y1QAqcyxT4v1y06p74SAJ4l
3xmpp5oc7Qo3/CIC2ZpSgG+ileNn+KNiDafLz1adhrnd+EmewnTvrW84TWW91jzN
eLE/+cpf1OlqyEwVsatxHf3SxbH2kd183paBeQs3qw8dwL4K6pag7M8Ca4tlxl88
KO/+APSZRGY/RV3GWxcDjsWgJFWZC7Eq9cgYvpXcovDNb2r9vuJPipiVaa8BFcLd
WBmULR0dwpoJm+OL2dJRncM17jn2+Q5JIA8VkiA/aDKXk1qTWDGIh/yoAs4RmTfI
PdehSo9puYvBNUr3NW4YTG1PuPssUXLMWeXM6Q2LgPtaC0RYpZddI33x1rVK5Ro7
5XuuyyNiSyUUkXJN2TVvsVQ6+d7Lf/Z9WBSE9kU1LaaDfd2hnZwCRhSgdhiZqJJM
K3aagqgRsEM9SqLdQqZKCJVZ1zXPqgIcd20ByNdkO3ERUI1veAjlJlNPRHQ3ZEnR
ELp0cvRvqKULqIPRt+EbO6tNjjcHS5Iq++JPoSi8fDgIGzqGpfFzpK5dL6C7tLep
IkR+oDmYcAg=
=IBql
-----END PGP SIGNATURE-----
Merge tag 'md-3.4-fixes' of git://neil.brown.name/md
Pull assorted md fixes from Neil Brown:
- some RAID levels didn't clear up properly if md_integrity_register
failed
- a 'check' of RAID5/RAID6 doesn't actually read any data since a
recent patch - so fix that (and mark for -stable)
- a couple of other minor bugs.
* tag 'md-3.4-fixes' of git://neil.brown.name/md:
md/raid1,raid10: don't compare excess byte during consistency check.
md/raid5: Fix a bug about judging if the operation is syncing or replacing
md/raid1:Remove unnecessary rcu_dereference(conf->mirrors[i].rdev).
md: Avoid OOPS when reshaping raid1 to raid0
md/raid5: fix handling of bad blocks during recovery.
md/raid1: If md_integrity_register() failed,run() must free the mem
md/raid0: If md_integrity_register() fails, raid0_run() must free the mem.
md/linear: If md_integrity_register() fails, linear_run() must free the mem.
Pull ARM fixes from Russell King:
"Nothing too big here, just small fixes."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: fix more fallout from 9f97da78bf (Disintegrate asm/system.h for ARM)
ARM: fix bios32.c build warning
ARM: 7337/1: ptrace: fix ptrace_read_user for !CONFIG_MMU platforms
ARM: fix missing bug.h include in arch/arm/kernel/insn.c
ARM: sa11x0: fix build errors from DMA engine API updates
Pull Sparc fixes from David Miller:
"One build regression and one serial probe regression fix on sparc."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
serial/sunzilog: fix keyboard on SUN SPARCstation
sparc: pgtable_64: change include order
To fix:
In file included from kernel/exit.c:61:
arch/avr32/include/asm/mmu_context.h: In function 'enable_mmu':
arch/avr32/include/asm/mmu_context.h:135: error: implicit declaration of function 'nop'
It needs an include of the new file created in commit ae47394658
("Disintegrate asm/system.h for AVR32"), but since that file only
contains "nop", and since other arch already have precedent of putting
nop in asm/barrier.h we should just delete the new file and put nop in
barrier.h
Suggested-and-acked-by: David Howells <dhowells@redhat.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
clk_disable_unused is invoked when CONFIG_OMAP_RESET_CLOCKS=y.
Since clk_disable_unused is called as lateinitcall, there can
be more than a few workqueues executing off secondary CPU(s).
The current code does the following:
a) checks if clk is unused
b) holds lock
c) disables clk
d) unlocks
Between (a) and (b) being executed on CPU0, It is possible to
have a driver executing on CPU1 which could do a get_sync->clk_get
(and increase the use_count) of the clock which was just about
to be disabled by clk_disable_unused.
We ensure instead that the entire list traversal is protected by
the lock allowing for parent child clock traversal which could be
potentially be done by runtime operations to be safe as well.
Reported-by: Todd Poynor <toddpoynor@google.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
This reverts commit d06221c061.
It turns out to trigger the "BUG_ON(!PageCompound(page))" in kfree(),
apparently because the code ends up trying to free somethng that was
never kmalloced in the first place.
BenH points out that the patch was untested and wasn't meant to go into
the upstream kernel that quickly in the first place.
Backtrace:
bios_shadow
bios_shadow_prom
nv_mask
init_io
bios_shadow
nouveau_bios_init
NVReadVgaCrtc
NVSetOwner
nouveau_card_init
nouveau_load
Reported-by: Meelis Roos <mroos@linux.ee>
Requested-by: Dave Airlie <airlied@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CLKS signal for McBSP ports can be selected from internal (PRCM) or
external (ABE_CLKS pin) source. To be able to use existing code we
need to create clock aliases consistent among OMAP2/3/4.
Based on a patch from Péter Ujfalusi <peter.ujfalusi@ti.com>;
the patch description above is his.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Péter Ujfalusi <peter.ujfalusi@ti.com>
Cc: Tony Lindgren <tony@atomide.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
The commit 996bc8aebd (mmc: sh_mobile_sdhi:
do not manage PM clocks manually) modified the sh_mobile_sdhi driver to
remove the clk_enable/clk_disable. So, we need to change
the "CLKDEV_CON_ID" to "CLKDEV_DEV_ID".
If we don't change this, we will see the following error from the driver:
sh_mobile_sdhi sh_mobile_sdhi.0: timeout waiting for hardware interrupt (CMD52)
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Commit 2a9f5a4d45 "OMAP3 clock: remove unnecessary duplicate of dpll4_m2_ck,
added for 36xx" consolidated dpll4 clock structures between 34xx and 36xx,
but left 34xx CLKSEL masks for most dpll4 related clocks, which causes
clock code to not behave correctly when booting on DM3730 with higher
(36xx only) divisors set:
[ 0.000000] WARNING: at arch/arm/mach-omap2/clkt_clksel.c:375 omap2_init_clksel_parent+0x104/0x114()
[ 0.000000] clock: dpll4_m3_ck: init parent: could not find regval 0
[ 0.000000] WARNING: at arch/arm/mach-omap2/clkt_clksel.c:194 omap2_clksel_recalc+0xd4/0xe4()
[ 0.000000] clock: Could not find fieldval 0 for clock dpll4_m3_ck parent dpll4_ck
Fix this by switching to 36xx masks, as valid divisors will be limited
by clksel_rate lists.
Cc: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
If transceiver is attached to a MMC host of ES2.1 OMAP35xx, it seems
2.1.1.128 erratum doesn't apply and there is no data corruption,
probably because of different signal timing. The workaround for this
erratum disables multiblock reads, which causes dramatic loss of
performance (over 75% slower), so avoid it when transceiver is present.
Cc: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
[paul@pwsan.com: edited commit message slightly]
Signed-off-by: Paul Walmsley <paul@pwsan.com>
According to the 4430 ES2.0 TRM vX Table 3-744 "CM_EMU_CLKSTCTRL",
the emu_sys clockdomain data in mainline is incorrect.
The emu_sys clockdomain does not support the DISABLE_AUTO state, and
instead it supports the FORCE_WAKEUP state.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Benoît Cousson <b-cousson@ti.com>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Will Deacon <will.deacon@arm.com>
MCA details seldom change inbetween the models of a family so don't
be too conservative and enable decoding on everything starting from
K8 onwards. Minor adjustments can come in later but most importantly,
we have some decoding infrastructure in place for upcoming models by
default.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
On a system running glibc trunk perf doesn't build:
CC builtin-sched.o
builtin-sched.c: In function ‘get_cpu_usage_nsec_parent’: builtin-sched.c:399:16: error: storage size of ‘ru’ isn’t known builtin-sched.c:403:2: error: implicit declaration of function ‘getrusage’ [-Werror=implicit-function-declaration]
[...]
Fix it by including sys/resource.h.
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120404084527.GA294@x4
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The keyboard on my SUN SPARCstation 5 no longer worked.
The culprint was: d4e33fac24
("serial: Kill off NO_IRQ")
Fix up logic for no irq / irq so the keyboard works again.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code cleanup of cifs_parse_mount_options resulted in a new bug being
introduced in the parsing of the UNC. This results in vol->UNC being
modified before vol->UNC was allocated.
Reported-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This driver was recently moved from IIO (where it worked) to hwmon (where
it doesn't.) This breakage occured because the hwmon version neglected to
correctly initialize a reference to spi_dev in its drvdata. The result is a
segfault every time the temperature is queried.
Signed-off-by: Graeme Smecher <gsmecher@threespeedlogic.com>
Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
We have to decrement the conntrack counter if we fail to access the
zone extension.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error path misses putting the timeout object. This patch adds
new function xt_ct_tg_timeout_put() to put the timeout object.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
NAPI is disabled during suspend and needs to be enabled on resume. Without
this the driver locks up during resume in rtl_reset_work() trying to disable
NAPI again.
Signed-off-by: Artem Savkov <artem.savkov@gmail.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The password parser has an unnecessary check for a NULL value which
triggers warnings in source checking tools. The code contains artifacts
from the old parsing code which are no longer required.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Commit 621b4d6 updated the bnx2x driver to a new FW version, but lacked
a commit to a header file with changes to the firmware's interface.
The missing interface change causes iscsi and fcoe to misbehave with the
updated firmware.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes Auto Power Saving configuration in ip101a_config_init
which was broken as there is no phy register write followed after
setting IP101A_APS_ON flag.
This patch also fixes the return value of ip101a_config_init.
Without this patch ip101a_config_init returns 2 which is not an error
accroding to IS_ERR and the mac driver will continue accessing 2 as
valid pointer to phy_dev resulting in memory fault.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In rare circumstances, a descriptor writeback flush may not work if it
arrives on a specific clock cycle as a writeback request is going out.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the adapter is closed while it is simultaneously going through a
reset, it can cause a null-pointer dereference when the two different code
paths simultaneously cleanup up the Tx/Rx resources.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix up code so that changes in DCB settings
are detected only when ixgbe_dcbnl_set_all is called.
Previously, a series of 'change' commands followed by
a call to ixgbe_dcbnl_set_all() would always be handled
as a HW change - even if the net change was zero.
This patch checks for this case of no actual change and
skips going through the HW set process.
Without this fix, the link could reset and result in
a link flap.
The core change in this patch is to check for changes
in the ixgbe_copy_dcb_cfg() routine - and return
a bitmask of detected changes. The other
places where changes were detected previously can be removed.
Signed-off-by: Eric Multanen <eric.w.multanen@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now the helper function from filter.c for negative offsets is exported,
it can be used it in the jit to handle negative offsets.
First modify the asm load helper functions to handle:
- know positive offsets
- know negative offsets
- any offset
then the compiler can be modified to explicitly use these helper
when appropriate.
This fixes the case of a negative X register and allows to lift
the restriction that bpf programs with negative offsets can't
be jited.
Signed-of-by: Jan Seiffert <kaffeemonster@googlemail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function is renamed to make it a little more clear what it does.
It is not added to any .h because it is not for general consumption, only for
bpf internal use (and so by the jits).
Signed-of-by: Jan Seiffert <kaffeemonster@googlemail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The explanation of ip_local_port_range in
Documentation/networking/ip-sysctl.txt contains several factual
errors:
- The default value of ip_local_port_range does not depend on the
amount of memory available in the system.
- tcp_tw_recycle is not enabled by default.
- 1024-4999 is not the default value.
- Etc.
Clean up the mess.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)
The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.
4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)
This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)
In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.
Call tcp_push() only if MSG_MORE is not set in the flags parameter.
This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.
In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)
Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>