linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-11 21:52:04 +00:00

Author	SHA1	Message	Date
Johannes Berg	c7c477b52c	mac80211: don't warn on AID field without top two MSBs set While the change between 802.11-2012 and 802.11-2016 to move from requiring APs to set the two top bits to now requiring them to be cleared was apparently unintentional and will be fixed, clients should either way assume that the top five bits are reserved and ignore them. Implement that in mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:20:03 +01:00
Johannes Berg	768075ebc2	nl80211: add a few extended error strings to key parsing This mostly serves as an example for how to add error strings and erroneous attribute pointers. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:19:45 +01:00
Sergey Matyukevich	6c2fb1e652	cfg80211: cleanup signal strength units notation Both cfg80211_rx_mgmt and cfg80211_report_obss_beacon functions send reports to userspace using NL80211_ATTR_RX_SIGNAL_DBM attribute w/o any processing of their input signal values. Which means that in order to match userspace tools expectations, input signal values for those functions are supposed to be in dBm units. This patch cleans up comments, variable names, and trace reports for those functions, replacing confusing 'mBm' by 'dBm'. Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:19:31 +01:00
Tova Mussai	9ae3b172e8	cfg80211: IBSS: Add support for static WEP in driver for IBSS Add support for drivers that implement static WEP internally for IBSS. Add the WEP keys to the IBSS params struct, that will allow the driver to use the keys in the join flow, and not only after the connection. Signed-off-by: Tova Mussai <tova.mussai@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:19:21 +01:00
Luca Coelho	c7976f5272	mac80211: remove BUG() when interface type is invalid In the ieee80211_setup_sdata() we check if the interface type is valid and, if not, call BUG(). This should never happen, but if there is something wrong with the code, it will not be caught until the bug happens when an interface is being set up. Calling BUG() is too extreme for this and a WARN_ON() would be better used instead. Change that. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:19:02 +01:00
Sara Sharon	2316380f84	mac80211: call synchronize_net once in the restart flow Currently the restart flow enables RX back, and then proceeds to tear down RX and TX aggregations. The TX aggregation tear down calls synchronize_net(), which waits for packet receiving to be done. This is done for every session, while RX processing is already active, and in some reproductions it takes up to 3 seconds. Add a call once in the restart_work, before we have traffic active again, and remove the subsequent calls when tearing down the aggregation. This requires to move down the code that turns off the reconfig flag in order to be able to test it in _ieee80211_stop_tx_ba_session(). Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:18:56 +01:00
Emmanuel Grumbach	9fef654433	mac80211: always update the PM state of a peer on MGMT / DATA frames The 2016 version of the spec is more generic about when the AP should update the power management state of the peer: the AP shall update the state based on any management or data frames. This means that even non-bufferable management frames should be looked at to update to maintain the power management state of the peer. This can avoid problematic cases for example if a station disappears while being asleep and then re-appears. The AP would remember it as in power save, but the Authentication frame couldn't be used to set the peer as awake again. Note that this issues wasn't really critical since at some point (after the association) we would have removed the station and created another one with all the states cleared. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:18:43 +01:00
Adiel Aloni	e16ea4bb51	mac80211_hwsim: enforce PS_MANUAL_POLL to be set after PS_ENABLED Enforce using PS_MANUAL_POLL in ps hwsim debugfs to trigger a poll, only if PS_ENABLED was set before. This is required due to commit c9491367b759 ("mac80211: always update the PM state of a peer on MGMT / DATA frames") that enforces the ap to check only mgmt/data frames ps bit, and then update station's power save accordingly. When sending only ps-poll (control frame) the ap will not be aware that the station entered power save. Setting ps enable before triggering ps_poll, will send NDP with PM bit enabled first. Signed-off-by: Adiel Aloni <adiel.aloni@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:18:15 +01:00
Johannes Berg	a9d09bc1bc	mac80211: make __ieee80211_start_rx_ba_session static The function is only used with the file, so make it static. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:16:05 +01:00
Yingying Tang	e2fb1b8392	mac80211: enable TDLS peer buffer STA feature Allow drivers to set the buffer station extended capability for TDLS links, with a new hardware flag indicating this. Signed-off-by: Yingying Tang <yintang@qti.qualcomm.com> [change commit log/documentation wording] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:16:05 +01:00
Johannes Berg	d559e303b1	mac80211: avoid looking up tid_tx/tid_rx from timers There's no need to re-lookup the data structures now that we actually get them immediately with from_timer(), just avoid that. The struct has to be valid anyway, otherwise the timer object itself would no longer be valid, and we can't have a different version of the struct since only a single session per TID is permitted. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:16:04 +01:00
Gustavo A. R. Silva	02049ce27e	mac80211: mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in some cases I replaced "fall through on else" and "otherwise fall through" comments with just a "fall through" comment, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-12-11 12:16:04 +01:00
David S. Miller	51e18a453f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflict was two parallel additions of include files to sch_generic.c, no biggie. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-09 22:09:55 -05:00
Michal Hocko	f335195adf	kmemcheck: rip it out for real Commit `4675ff05de` ("kmemcheck: rip it out") has removed the code but for some reason SPDX header stayed in place. This looks like a rebase mistake in the mmotm tree or the merge mistake. Let's drop those leftovers as well. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-12-08 13:40:17 -08:00
Linus Torvalds	e9ef1fe312	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb drivers). 2) Revert returning -EEXIST from __dev_alloc_name() as this propagates to userspace and broke some apps. From Johannes Berg. 3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong Wang. 4) Gianfar MAC can't do EEE so don't advertise it by default, from Claudiu Manoil. 5) Relax strict netlink attribute validation, but emit a warning. From David Ahern. 6) Fix regression in checksum offload of thunderx driver, from Florian Westphal. 7) Fix UAPI bpf issues on s390, from Hendrik Brueckner. 8) New card support in iwlwifi, from Ihab Zhaika. 9) BBR congestion control bug fixes from Neal Cardwell. 10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren. 11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan. 12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni. 13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) net: mvpp2: fix the RSS table entry offset tcp: evaluate packet losses upon RTT change tcp: fix off-by-one bug in RACK tcp: always evaluate losses in RACK upon undo tcp: correctly test congestion state in RACK bnxt_en: Fix sources of spurious netpoll warnings tcp_bbr: reset long-term bandwidth sampling on loss recovery undo tcp_bbr: reset full pipe detection on loss recovery undo tcp_bbr: record "full bw reached" decision in new full_bw_reached bit sfc: pass valid pointers from efx_enqueue_unwind gianfar: Disable EEE autoneg by default tcp: invalidate rate samples during SACK reneging can: peak/pcie_fd: fix potential bug in restarting tx queue can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: mcba_usb: cancel urb on -EPROTO usbnet: fix alignment for frames with no ethernet header tcp: use current time in tcp_rcv_space_adjust() ...	2017-12-08 13:32:44 -08:00
Linus Torvalds	77071bc6c4	media fixes for v4.15-rc3 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaKrOpAAoJEAhfPr2O5OEVK5sP/iHWDJtRw/WWBYOqF9cGFl0o ssh1iXJAsOmLjaARMOkxQBPLwTvVMreux2fHow/mukrXB8BIR7OmflBaQRLM3lbW xm9G4yarXf7xgkxngisemQrJiweNyaNDX9P0BqJLV55Xhp+rO2Q6rutspho3xzoo Pmz7bgnt0FjBKz+0LWnKnzozdNinj2uhTdBOTgm9rUuUUGiAVTAQSNUq2e5Gzw3m DhO4UPOJDRmnZ6Rqldq2pD2wzmRfbVonirz7IEh7j2opcAoCM2TnmM+Z9C5zjlUj XVFagiR+XG8lsh2zvHdweter4X9DqLBlMbBDVAQ+vH+xhBuz63Yqq1pSIvXA342U rs1X182FSyE+pOxipuae94csHkyzlb9tmzoiUoItU+QXi8Wlg8pLH6GFQe/72t/g HpcXMBq2lflc2KDstx+QGs/G+1EYtz64vUXZ4pvuIitrhdbd5ulLh7Y7nb7qfEym WEHpaJwxh4MZh6RzB2tq+Tvy0/sD1krNjPTXtf2CuTdFNUuvk8QenRW1q2YVA06L UOTxRpfPYUPwhGKOoKb2IJ5g1xoztoqMaafdsUJ16H7fcuVtE9xQayVCVhaBuo08 4g7SFCb7MhNLFG1Su291u9PKYlvGhCJaDPKE/6E+IM6szxt7h/kveYsdb05yWOoB di+cIWZkCRUzbO7fxz4J =xoHW -----END PGP SIGNATURE----- Merge tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "A series of fixes for the media subsytem: - The largest amount of fixes in this series is with regards to comments that aren't kernel-doc, but start with "/*". A new check added for 4.15 makes it to produce a huge* amount of new warnings (I'm compiling here with W=1). Most of the patches in this series fix those. No code changes - just comment changes at the source files - rc: some fixed in order to better handle RC repetition codes - v4l-async: use the v4l2_dev from the root notifier when matching sub-devices - v4l2-fwnode: Check subdev count after checking port - ov 13858 and et8ek8: compilation fix with randconfigs - usbtv: a trivial new USB ID addition - dibusb-common: don't do DMA on stack on firmware load - imx274: Fix error handling, add MAINTAINERS entry - sir_ir: detect presence of port" * tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (50 commits) media: imx274: Fix error handling, add MAINTAINERS entry media: v4l: async: use the v4l2_dev from the root notifier when matching sub-devices media: v4l2-fwnode: Check subdev count after checking port media: et8ek8: select V4L2_FWNODE media: ov13858: Select V4L2_FWNODE media: rc: partial revert of "media: rc: per-protocol repeat period" media: dvb: i2c transfers over usb cannot be done from stack media: dvb-frontends: complete kernel-doc markups media: docs: add documentation for frontend attach info media: dvb_frontends: fix kernel-doc macros media: drivers: remove "/**" from non-kernel-doc comments media: lm3560: add a missing kernel-doc parameter media: rcar_jpu: fix two kernel-doc markups media: vsp1: add a missing kernel-doc parameter media: soc_camera: fix a kernel-doc markup media: mt2063: fix some kernel-doc warnings media: radio-wl1273: fix a parameter name at kernel-doc macro media: s3c-camif: add missing description at s3c_camif_find_format() media: mtk-vpu: add description for wdt fields at struct mtk_vpu media: vdec: fix some kernel-doc warnings ...	2017-12-08 13:18:47 -08:00
Linus Torvalds	4066aa72f9	i915, amdgpu + misc fixes -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaKedrAAoJEAx081l5xIa+CLkP/AkLTMW17WT/aktLfq4/+NQV WGcLxQw+SB9iYMffFUNdQdMqv3pKsDzeatifxNuoRnjvK6rNZDUFK11SoTEFCdA+ opI3zBPiyMVv71LtLF/r2kj3Vw5rnARdCg24FIzptGg/4j+G0EOb9uzd9qjmjof1 r2XD1f0mjIG6SREMYCZPJj/Q/ZGfWFKXxcatmf87sAoq4o2nfoQ1Sl0T4cnMV7yp FzzFKA/j6OaiVa0Tb1KiJOl8VnW5f9STanUynFcPtKd2mhDcVaEP4lAfBhHgccd8 ufmsyPJDadGQDT0iQY5E8n+ht124wg4gW8TKGqyxDfxuzTFj2k62WOUMH4eSjGSt x7SHEewTQLoV5kIPp0y00T2NWuA4/jmjjgllQZgqSFwnsv8fQ9/GwLNPWrUGM0+z 6sqzj7QRyn8GGuYblqVghVYj0bX3ovmYlLzOaKZ+pNSM34hLBFRenZQ5mKJ6WrcE SIKzuQm3HEC8KwNw72zZ6+zz5fc7x9KuURztwEPt6vOMnxz1PtjWSdHzfvja4cH2 6D2t0lZSqkM244FFG4ImFP6CDF+8vaY7i9YybjYteycWH7o9haVa869vCxUdNUWo DDghWKPRjbl7LJLJ14BxMrg0+sk+XWYcoZXjUAM0rYFFOWoFkVAlF8Sy3kgb5Pky wMLalQohWFbkZY60q02h =2f34 -----END PGP SIGNATURE----- Merge tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "This pull is a bit larger than I'd like but a large bunch of it is license fixes, AMD wanted to fix the licenses for a bunch of files that were missing them, Otherwise a bunch of TTM regression fix since the hugepage support, some i915 and gvt fixes, a core connector free in a safe context fix, and one bridge fix" * tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux: (26 commits) drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk" drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage drm/i915: Call i915_gem_init_userptr() before taking struct_mutex drm/exynos: remove unnecessary function declaration drm/exynos: remove unnecessary descrptions drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU drm/exynos: Fix dma-buf import drm/ttm: swap consecutive allocated pooled pages v4 drm: safely free connectors from connector_iter drm/i915/gvt: set max priority for gvt context drm/i915/gvt: Don't mark vgpu context as inactive when preempted drm/i915/gvt: Limit read hw reg to active vgpu drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id() drm/i915/gvt: Emulate PCI expansion ROM base address register drm/ttm: swap consecutive allocated cached pages v3 drm/ttm: roundup the shrink request to prevent skip huge pool drm/ttm: add page order support in ttm_pages_put drm/ttm: add set_pages_wb for handling page order more than zero drm/ttm: add page order in page pool ...	2017-12-08 13:11:57 -08:00
Linus Torvalds	7267212c80	Merge tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull md fixes from Shaohua Li: "Some MD fixes. The notable one is a raid5-cache deadlock bug with dm-raid, others are not significant" * tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/raid1/10: add missed blk plug md: limit mdstat resync progress to max_sectors md/r5cache: move mddev_lock() out of r5c_journal_mode_set() md/raid5: correct degraded calculation in raid5_error	2017-12-08 13:03:02 -08:00
Linus Torvalds	78d9b04844	DeviceTree fixes for v4.15 (part2): - Fixes from overlay code rework. A trifecta of fixes to the locking, an out of bounds access, and a memory leak in of_overlay_apply(). - Clean-up at25 eeprom binding document - Remove leading '0x' in unit-addresses from binding docs -----BEGIN PGP SIGNATURE----- iQItBAABCAAXBQJaKrC8EBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcPY7Q/9 GOZDOhajewVoYo+LkKl8+1wtdih7QHOvziKY8Jn1Nd/o6ASXU5ltOxvk5iF/q4tz x7JFKMkUo8eyOy5p5XnkqAzOnQ+rP+aRvHxjeAiMtQZRtMcDJ6bv4FRMI4/hq610 QocAcxzblXJayy/Ad2sbuMr34O6QBkvnicMiLHZNb2IhCgk0nxBbyiNJDr868XE6 CwlkKgo+y0fFJKkW3y4v4jaMdMlmcxTPTY+4jOJoGnnD1LgNwNRgkTFCzRQLebBk E5D6XweMS7G0IOO8wLYEn70JYsNk4wRGjIvlZGR1e5REQNgT9H+1J8rXLxlWc5rk l2o4V3+nrBj3BpVCG5oQWBrwVsMZS8ExdnRnrGmZJ25C1EFzlRHGYtsvo6fvXsLs T2VfYPEDhN5IHZiUU/rcHfz1VpHjuBrtFStPQkFTSTfO52n22Q5KWBxHUqUjXbBZ phcPoHcDm+FfO4swkLuIP0wDAiNj8SJpJJNnHLrgIuy52UvXe/VD5BlY5RgXIZbg XhxAcsNKrqfH8ggfbM7ndXkUoFrDyl/dOupfwbtwzNqDQiPGvpjkr9hmE7XIL2aJ tyGahgkpZ3n49oGJPe5rTHKJacZBkl6rNI2CFQfERXyGYUmQSEU6AViSSyRG6ljK kXLmIzNPgJJF/I9YC3q8Wr6FUoOAapE7Zww/Cc83xJ8= =vBBN -----END PGP SIGNATURE----- Merge tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree fixes from Rob Herring: "Another set of DT fixes: - Fixes from overlay code rework. A trifecta of fixes to the locking, an out of bounds access, and a memory leak in of_overlay_apply() - Clean-up at25 eeprom binding document - Remove leading '0x' in unit-addresses from binding docs" * tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: overlay: Make node skipping in init_overlay_changeset() clearer of: overlay: Fix out-of-bounds write in init_overlay_changeset() of: overlay: Fix (un)locking in of_overlay_apply() of: overlay: Fix memory leak in of_overlay_apply() error path dt-bindings: eeprom: at25: Document device-specific compatible values dt-bindings: eeprom: at25: Grammar s/are can/can/ dt-bindings: Remove leading 0x from bindings notation of: overlay: Remove else after goto of: Spelling s/changset/changeset/ of: unittest: Remove bogus overlay mutex release from overlay_data_add()	2017-12-08 13:00:51 -08:00
Linus Torvalds	900add27f5	virtio: bugfixes A couple of minor bugfixes. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJaKW26AAoJECgfDbjSjVRpDwIH/1tq1vcjd5i00nh+gQD7jWm2 qWrpcKbsS0ANTvhm9UNFY24BVoPuNpNrApQpuBxmgxE/XGqx7A+8xXhwPAM5lLiG uDSPB2nsfjvEUOde7bgeR+t6ay+Ki2UWKzY46lSoCTcIN7BSVFWQZonsGu2xLDzz kKpCtlobXRnXzeWm+fh1oOZu/cn/TuAF0mbb+6TUQqSsHAt6PQ3Hwsly2EmQV5xR of6Si/TnxFOkhZUKpezbuTU/g/20ZkHUSzgfrOuZLGonaIvZIE6BUm8m21/IgCbw WnwSqe66WsWF2vd0RDLFq4L3qaRyGn+W7iYo3KPbwIPEEotSRFMUgixBWBB3IgU= =Pf4w -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio bugfixes from Michael Tsirkin: "A couple of minor bugfixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: fix return value check in receive_mergeable() virtio_mmio: add cleanup for virtio_mmio_remove virtio_mmio: add cleanup for virtio_mmio_probe	2017-12-08 12:58:51 -08:00
Linus Torvalds	32abeb09ab	xen: fixes for 4.15-rc3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJaKlu/AAoJELDendYovxMvaqoH/i7gN9xry7QkUM6RkwddGwYY v0rqaUo4WCW27yFOE7Bzej9Y+W92/eFPJnVUhc/quTVfV+uEjbs4PiAwuxSr+lIU X+BhBNbEi9C5RlRL1z75J0ZySyu6WXL2hsmPbc0wrrqdQikfiZ7bnRjdGAHh5C5C TijKQKGZTt6ccjIPUEZTIqeajOt/p7uxkCXPWhHQA1mudf9PVhsKyYnGdYp5gp8X KID+8XmKtAcSwPUz+eG9vGlGwmP28mH0BfCT0suC2uUI4o+PJFPqBTlfsco2kfHO NqVCgnMZs31Js8mdEVz8h2ZO8m2T5m1oml1zOeyDbgTJ8yjqgADy8K6Lm38clko= =ZHtb -----END PGP SIGNATURE----- Merge tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Just two small fixes for the new pvcalls frontend driver" * tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvcalls: Fix a check in pvcalls_front_remove() xen/pvcalls: check for xenbus_read() errors	2017-12-08 12:53:43 -08:00
Linus Torvalds	d90696ed61	powerpc fixes for 4.15 #4 One notable fix for kexec on Power9, where we were not clearing MMU PID properly which sometimes leads to hangs. Finally debugged to a root cause by Nick. A revert of a patch which tried to rework our panic handling to get more output on the console, but inadvertently broke reporting the panic to the hypervisor, which apparently people care about. Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in xmon. Thanks to: David Gibson, Nicholas Piggin, Ravi Bangoria. -----BEGIN PGP SIGNATURE----- iQIwBAABCAAaBQJaKoWXExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAp Rw/+KRvwt1jt3vFKrWlcXQ4Mx4UTseSaBO7FsGwyANqNGUNvkIEIAZYu6M9x0LLh tfVowZdJ2vQrgdZy4Rd5zhIjzVaybyENMAMZFmGCxQUidORdibP2qT+3612FmQl0 rczQB4Ra1Jymw+42iwe4WQfyta9cvVgfk7D+1KVWaCXQ0lx8DynZ75yK+U0fensz FPQNdtkfC2D37IFrqtgGBS5YLkeQpfftm8C/eBG0n2tv8PO1KM5xwVU8Ovf5LoIm 8NbWL//H+zUOoU2jCGHDMfg1qLv9owScTMRtquQSmrE1i21mE2lLOSDSM105+AP2 7CVRMMkth8V9w/nauPq0a5OGyzJWtClI9qj2ZPWS2wPF331g58GUNJsEy7OQAJgO QZoqcCkpT5qarmxkcKJlYZGF6AZ/4mIBL9mucfQc/afEgRUqksaUKck0qD5SW28j fm3pPjlMyf2vMRKGgaE9/+by5N/Bmxy2VCoFSuhm1ZrQsIpZXtp/Mfylqz0msdhU VCt4T229S7rdCQTn2TyMNW+iVmjlgvR4OUXvba/eBz67gzGk4huLNB4EnEwHA/SK qhkTJqYbP9B/MBD9GrNLFzG5yZTTv+3OA/aehL0PEGouV7cgMEqyGtYw2afwggRC sf+veK/2apPMnlA5WItEa7JPWaTLsxljZ65acskb7S/4W8g= =wU60 -----END PGP SIGNATURE----- Merge tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One notable fix for kexec on Power9, where we were not clearing MMU PID properly which sometimes leads to hangs. Finally debugged to a root cause by Nick. A revert of a patch which tried to rework our panic handling to get more output on the console, but inadvertently broke reporting the panic to the hypervisor, which apparently people care about. Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in xmon. Thanks to: David Gibson, Nicholas Piggin, Ravi Bangoria" * tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xmon: Don't print hashed pointers in xmon powerpc/64s: Initialize ISAv3 MMU registers before setting partition table Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier" powerpc/perf: Fix oops when grouping different pmu events	2017-12-08 12:52:09 -08:00
David S. Miller	fd29117aeb	linux-can-fixes-for-4.15-20171208 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAloqeVYTHG1rbEBwZW5n dXRyb25peC5kZQAKCRAe/sog7Dgc9tiqCADK4f/QYW5q5jC93A6JZSItI8vAK2+h 0s4MRTj9x+thBIGIhJ59uYBSTd374bvsWmrGdV7CBoGX4TnEJfGiMV77lpGnGRVG jpzk9cSFoAnE5UW2qZlF+JM8SNEFlU18MCQQlnMzKbSGerUAlveK+mcF5sJrqrQh CGZ9MH1Bp4Fz3WMRQ9hHzKWjTOhhM54qjPceCVTZM6I0RJam6I2lpVZPQeom9uVa r+F5Lv2ZOpNZc+8Pbu+L95YyivKKaQOPzeP4btFLNEHUyFDygHcv2iKRIn9MdEu2 2XfaDVKk2Ey/qWc782SLBxLOihnhWltwC7Kg1ZnrLhNZ6V5UbYQ5FzF4 =OMZE -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-4.15-20171208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-12-08 this is a pull request of 6 patches for net/master. Martin Kelly provides 5 patches for various USB based CAN drivers, that properly cancel the URBs on adapter unplug, so that the driver doesn't end up in an endless loop. Stephane Grosjean provides a patch to restart the tx queue if zero length packages are transmitted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:53:54 -05:00
Girish Moodalbail	5e54b3c120	macvlan: fix memory hole in macvlan_dev Move 'macaddr_count' from after 'netpoll' to after 'nest_level' to pack and reduce a memory hole. Fixes: `88ca59d1aa` (macvlan: remove unused fields in struct macvlan_dev) Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:51:46 -05:00
David S. Miller	03afb6e43a	wireless-drivers fixes for 4.15 Second set of fixes for 4.15. This time a lot of iwlwifi patches and two brcmfmac patches. Most important here are the MIC and IVC fixes for iwlwifi to unbreak 9000 series. iwlwifi * fix rate-scaling to not start lowest possible rate * fix the TX queue hang detection for AP/GO modes * fix the TX queue hang timeout in monitor interfaces * fix packet injection * remove a wrong error message when dumping PCI registers * fix race condition with RF-kill * tell mac80211 when the MIC has been stripped (9000 series) * tell mac80211 when the IVC has been stripped (9000 series) * add 2 new PCI IDs, one for 9000 and one for 22000 * fix a queue hang due during a P2P Remain-on-Channel operation brcmfmac * fix a race which sometimes caused a crash during sdio unbind * fix a kernel-doc related build error -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJaKp9UAAoJEG4XJFUm622bhXgH+wTtTVEH0lAOTtK+PyBkxkRH Q+55Yf1XZ9lNYxmfXhYgObusSbmeL8tClMuISCcQS9gX0um1Vuuud4CLemgO2V7R V1xuWTKjanaKf8PouKx9SUt1Fx6CsFdwlivJX+eZTfKlKYtwNbNX4onWl9GN2jVZ 5/2l+m3MJbTMMzarZTGLkBJqpTk8DGTNINtKeRd+VF+717SlbqpRlw1TTlbVJcR2 nHcJ3p5JGbU/+hOTroOWUr7kGdpdYlWLfcyOL8iT3rZXtAzH/POjPAmQv9VVRaC+ anm5zn+gZ5GH9XF+pc3nrGOEZ2Ei4LtszMQjdo4Zo9V3ngCj/0OoEnkto6VLbkw= =lzG8 -----END PGP SIGNATURE----- Merge tag 'wireless-drivers-for-davem-2017-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.15 Second set of fixes for 4.15. This time a lot of iwlwifi patches and two brcmfmac patches. Most important here are the MIC and IVC fixes for iwlwifi to unbreak 9000 series. iwlwifi * fix rate-scaling to not start lowest possible rate * fix the TX queue hang detection for AP/GO modes * fix the TX queue hang timeout in monitor interfaces * fix packet injection * remove a wrong error message when dumping PCI registers * fix race condition with RF-kill * tell mac80211 when the MIC has been stripped (9000 series) * tell mac80211 when the IVC has been stripped (9000 series) * add 2 new PCI IDs, one for 9000 and one for 22000 * fix a queue hang due during a P2P Remain-on-Channel operation brcmfmac * fix a race which sometimes caused a crash during sdio unbind * fix a kernel-doc related build error ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:48:49 -05:00
Marc Kleine-Budde	936e5d8bdf	slip: sl_alloc(): remove unused parameter "dev_t line" The first and only parameter of sl_alloc() is unused, so remove it. Fixes: `5342b77c41` slip: ("Clean up create and destroy") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:41:02 -05:00
Antoine Tenart	8a7b741e76	net: mvpp2: fix the RSS table entry offset The macro used to access or set an RSS table entry was using an offset of 8, while it should use an offset of 0. This lead to wrongly configure the RSS table, not accessing the right entries. Fixes: `1d7d15d79f` ("net: mvpp2: initialize the RSS tables") Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:35:15 -05:00
David S. Miller	2deeb49550	Merge branch 'cxgb4-collect-hardware-logs-via-ethtool' Rahul Lakkireddy says: ==================== cxgb4: collect hardware logs via ethtool Collect more hardware logs via ethtool --get-dump facility. Patch 1 collects on-chip memory layout information. Patch 2 collects on-chip MC memory dumps. Patch 3 collects HMA memory dump. Patch 4 evaluates and skips TX and RX payload regions in memory dumps. Patch 5 collects egress and ingress SGE queue contexts. Patch 6 collects PCIe configuration logs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:51 -05:00
Rahul Lakkireddy	6078ab196b	cxgb4: collect PCIe configuration logs Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	736c3b9447	cxgb4: collect egress and ingress SGE queue contexts Use meminfo to identify the egress and ingress context regions and fetch all valid contexts from those regions. Also flush all contexts before attempting collection to prevent stale information. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	c1219653f3	cxgb4: skip TX and RX payload regions in memory dumps Use meminfo to identify TX and RX payload regions and skip them in collection of EDC, MC, and HMA. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	4db0401f8a	cxgb4: collect HMA memory dump Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	a1c69520f7	cxgb4: collect MC memory dump Use meminfo to get base address and size of MC memory. Also use same meminfo for EDC memory dumps. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:50 -05:00
Rahul Lakkireddy	123e25c4a5	cxgb4: collect on-chip memory information Collect memory layout of various on-chip memory regions. Move code for collecting on-chip memory information to cudbg_lib.c and update cxgb4_debugfs.c to use the common function. Also include cudbg_entity.h before cudbg_lib.h to avoid adding cudbg entity structure forward declarations in cudbg_lib.h. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:31:49 -05:00
David S. Miller	62fd8b189e	Merge branch 'veth-and-GSO-maximums' Stephen Hemminger says: ==================== veth and GSO maximums This is the more general way to solving the issue of GSO limits not being set correctly for containers on Azure. If a GSO packet is sent to host that exceeds the limit (reported by NDIS), then the host is forced to do segmentation in software which has noticeable performance impact. The core rtnetlink infrastructure already has the messages and infrastructure to allow changing gso limits. With an updated iproute2 the following already works: # ip li set dev dummy0 gso_max_size 30000 These patches are about making it easier with veth. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:23:00 -05:00
Stephen Hemminger	72d24955b4	veth: set peer GSO values When new veth is created, and GSO values have been configured on one device, clone those values to the peer. For example: # ip link add dev vm1 gso_max_size 65530 type veth peer name vm2 This should create vm1 <--> vm2 with both having GSO maximum size set to 65530. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:22:59 -05:00
Stephen Hemminger	46e6b992c2	rtnetlink: allow GSO maximums to be set on device creation Netlink device already allows changing GSO sizes with ip set command. The part that is missing is allowing overriding GSO settings on device creation. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:22:59 -05:00
Niklas Cassel	5a6a0445d1	net: stmmac: fix broken dma_interrupt handling for multi-queues There is nothing that says that number of TX queues == number of RX queues. E.g. the ARTPEC-6 SoC has 2 TX queues and 1 RX queue. This code is obviously wrong: for (chan = 0; chan < tx_channel_count; chan++) { struct stmmac_rx_queue rx_q = &priv->rx_queue[chan]; priv->rx_queue has size MTL_MAX_RX_QUEUES, so this will send an uninitialized napi_struct to __napi_schedule(), causing us to crash in net_rx_action(), because napi_struct->poll is zero. [12846.759880] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [12846.768014] pgd = (ptrval) [12846.770742] [00000000] pgd=39ec7831, pte=00000000, ppte=00000000 [12846.777023] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM [12846.782942] Modules linked in: [12846.785998] CPU: 0 PID: 161 Comm: dropbear Not tainted 4.15.0-rc2-00285-gf5fb5f2f39a7 #36 [12846.794177] Hardware name: Axis ARTPEC-6 Platform [12846.798879] task: (ptrval) task.stack: (ptrval) [12846.803407] PC is at 0x0 [12846.805942] LR is at net_rx_action+0x274/0x43c [12846.810383] pc : [<00000000>] lr : [<80bff064>] psr: 200e0113 [12846.816648] sp : b90d9ae8 ip : b90d9ae8 fp : b90d9b44 [12846.821871] r10: 00000008 r9 : 0013250e r8 : 00000100 [12846.827094] r7 : 0000012c r6 : 00000000 r5 : 00000001 r4 : bac84900 [12846.833619] r3 : 00000000 r2 : b90d9b08 r1 : 00000000 r0 : bac84900 Since each DMA channel can be used for rx and tx simultaneously, the current code should probably be rewritten so that napi_struct is embedded in a new struct stmmac_channel. That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue where we got the IRQ, instead of looping through all tx queues. This is also how the xgbe driver does it (another driver for this IP). Fixes: `c22a3f48ef` ("net: stmmac: adding multiple napi mechanism") Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:18:40 -05:00
David S. Miller	b7e445a1c8	Merge branch 'tcp-RACK-loss-recovery-bug-fixes' Yuchung Cheng says: ==================== tcp: RACK loss recovery bug fixes This patch set has four minor bug fixes in TCP RACK loss recovery. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:12 -05:00
Yuchung Cheng	6065fd0d17	tcp: evaluate packet losses upon RTT change RACK skips an ACK unless it advances the most recently delivered TX timestamp (rack.mstamp). Since RACK also uses the most recent RTT to decide if a packet is lost, RACK should still run the loss detection whenever the most recent RTT changes. For example, an ACK that does not advance the timestamp but triggers the cwnd undo due to reordering, would then use the most recent (higher) RTT measurement to detect further losses. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:11 -05:00
Yuchung Cheng	428aec5e69	tcp: fix off-by-one bug in RACK RACK should mark a packet lost when remaining wait time is zero. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:11 -05:00
Yuchung Cheng	cd1fc85b43	tcp: always evaluate losses in RACK upon undo When sender detects spurious retransmission, all packets marked lost are remarked to be in-flight. However some may be considered lost based on its timestamps in RACK. This patch forces RACK to re-evaluate, which may be skipped previously if the ACK does not advance RACK timestamp. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:11 -05:00
Yuchung Cheng	0ce294d884	tcp: correctly test congestion state in RACK RACK does not test the loss recovery state correctly to compute the reordering window. It assumes if lost_out is zero then TCP is not in loss recovery. But it can be zero during recovery before calling tcp_rack_detect_loss(): when an ACK acknowledges all packets marked lost before receiving this ACK, but has not yet to discover new ones by tcp_rack_detect_loss(). The fix is to simply test the congestion state directly. Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:14:11 -05:00
Egil Hjelmeland	2e8d243e88	net: dsa: lan9303: Protect ALR operations with mutex ALR table operations are a sequence of related register operations which should be protected from concurrent access. The alr_cache should also be protected. Add alr_mutex doing that. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:12:33 -05:00
Jiri Pirko	df45bf84e4	net: sched: fix use-after-free in tcf_block_put_ext Since the block is freed with last chain being put, once we reach the end of iteration of list_for_each_entry_safe, the block may be already freed. I'm hitting this only by creating and deleting clsact: [ 202.171952] ================================================================== [ 202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390 [ 202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796 [ 202.194508] [ 202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5 [ 202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016 [ 202.213613] Call Trace: [ 202.216369] dump_stack+0xda/0x169 [ 202.220192] ? dma_virt_map_sg+0x147/0x147 [ 202.224790] ? show_regs_print_info+0x54/0x54 [ 202.229691] ? tcf_chain_destroy+0x1dc/0x250 [ 202.234494] print_address_description+0x83/0x3d0 [ 202.239781] ? tcf_block_put_ext+0x240/0x390 [ 202.244575] kasan_report+0x1ba/0x460 [ 202.248707] ? tcf_block_put_ext+0x240/0x390 [ 202.253518] tcf_block_put_ext+0x240/0x390 [ 202.258117] ? tcf_chain_flush+0x290/0x290 [ 202.262708] ? qdisc_hash_del+0x82/0x1a0 [ 202.267111] ? qdisc_hash_add+0x50/0x50 [ 202.271411] ? __lock_is_held+0x5f/0x1a0 [ 202.275843] clsact_destroy+0x3d/0x80 [sch_ingress] [ 202.281323] qdisc_destroy+0xcb/0x240 [ 202.285445] qdisc_graft+0x216/0x7b0 [ 202.289497] tc_get_qdisc+0x260/0x560 Fix this by holding the block also by chain 0 and put chain 0 explicitly, out of the list_for_each_entry_safe loop at the very end of tcf_block_put_ext. Fixes: `efbf789739` ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:09:08 -05:00
Calvin Owens	2edbdb3159	bnxt_en: Fix sources of spurious netpoll warnings After applying `2270bc5da3` ("bnxt_en: Fix netpoll handling") and `903649e718` ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."), we still see the following WARN fire: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160 bnxt_poll+0x0/0xd0 exceeded budget in poll <snip> Call Trace: [<ffffffff814be5cd>] dump_stack+0x4d/0x70 [<ffffffff8107e013>] __warn+0xd3/0xf0 [<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60 [<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160 [<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250 [<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440 [<ffffffff815fa9be>] write_ext_msg+0x20e/0x250 [<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110 [<ffffffff810c9549>] console_unlock+0x339/0x5b0 [<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450 [<ffffffff810c9d5f>] vprintk_default+0x1f/0x30 [<ffffffff81173df5>] printk+0x48/0x50 [<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core] [<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core] [<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac] [<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac] [<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core] [<ffffffff81095f8b>] process_one_work+0x19b/0x480 [<ffffffff810967ca>] worker_thread+0x6a/0x520 [<ffffffff8109c7c4>] kthread+0xe4/0x100 [<ffffffff81884c52>] ret_from_fork+0x22/0x40 This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually given a non-zero budget. Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 14:07:19 -05:00
David S. Miller	fc8b81a598	Merge branch 'lockless-qdisc-series' John Fastabend says: ==================== lockless qdisc series This series adds support for building lockless qdiscs. This is the result of noticing the qdisc lock is a common hot-spot in perf analysis of the Linux network stack, especially when testing with high packet per second rates. However, nothing is free and most qdiscs rely on the qdisc lock for their data structures so each qdisc must be converted on a case by case basis. In this series, to kick things off, we make pfifo_fast, mq, and mqprio lockless. Follow up series can address additional qdiscs as needed. For example sch_tbf might be useful. To allow this the lockless design is an opt-in flag. In some future utopia we convert all qdiscs and we get to drop this case analysis, but in order to make progress we live in the real-world. There are also a handful of optimizations I have behind this series and a few code cleanups that I couldn't figure out how to fit neatly into this series with out increasing the patch count. Once this is in additional patches can address this. The most notable is in skb_dequeue we can push the consumer lock out a bit and consume multiple skbs off the skb_array in pfifo fast per iteration. Ideally we could push arrays of packets at drivers as well but we would need the infrastructure for this. The other notable improvement is to do less locking in the overrun cases where bad tx queue list and gso_skb are being hit. Although, nice in theory in practice this is the error case and I haven't found a benchmark where this matters yet. For testing... My first test case uses multiple containers (via cilium) where multiple client containers use 'wrk' to benchmark connections with a server container running lighttpd. Where lighttpd is configured to use multiple threads, one per core. Additionally this test has a proxy agent running so all traffic takes an extra hop through a proxy container. In these cases each TCP packet traverses the egress qdisc layer at least four times and the ingress qdisc layer an additional four times. This makes for a good stress test IMO, perf details below. The other micro-benchmark I run is injecting packets directly into qdisc layer using pktgen. This uses the benchmark script, ./pktgen_bench_xmit_mode_queue_xmit.sh Benchmarks taken in two cases, "base" running latest net-next no changes to qdisc layer and "qdisc" tests run with qdisc lockless updates. Numbers reported in req/sec. All virtual 'veth' devices run with pfifo_fast in the qdisc test case. `wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"` conns 16 32 64 1024 ----------------------------------------------- base: 18831 20201 21393 29151 qdisc: 19309 21063 23899 29265 notice in all cases we see performance improvement when running with qdisc case. Microbenchmarks using pktgen are as follows, `pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000 base(mq): 2.1Mpps base(pfifo_fast): 2.1Mpps qdisc(mq): 2.6Mpps qdisc(pfifo_fast): 2.6Mpps notice numbers are the same for mq and pfifo_fast because only testing a single thread here. In both tests we see a nice bump in performance gain. The key with 'mq' is it is already per txq ring so contention is minimal in the above cases. Qdiscs such as tbf or htb which have more contention will likely show larger gains when/if lockless versions are implemented. Thanks to everyone who helped with this work especially Daniel Borkmann, Eric Dumazet and Willem de Bruijn for discussing the design and reviewing versions of the code. Changes from the RFC: dropped a couple patches off the end, fixed a bug with skb_queue_walk_safe not unlinking skb in all cases, fixed a lockdep splat with pfifo_fast_destroy not calling *_bh lock variant, addressed _most_ of Willem's comments, there was a bug in the bulk locking (final patch) of the RFC series. @Willem, I left out lockdep annotation for a follow on series to add lockdep more completely, rather than just in code I touched. Comments and feedback welcome. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:27 -05:00
John Fastabend	c5ad119fb6	net: sched: pfifo_fast use skb_array This converts the pfifo_fast qdisc to use the skb_array data structure and set the lockless qdisc bit. pfifo_fast is the first qdisc to support the lockless bit that can be a child of a qdisc requiring locking. So we add logic to clear the lock bit on initialization in these cases when the qdisc graft operation occurs. This also removes the logic used to pick the next band to dequeue from and instead just checks a per priority array for packets from top priority to lowest. This might need to be a bit more clever but seems to work for now. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00
John Fastabend	4a86a4cf68	net: skb_array: expose peek API This adds a peek routine to skb_array.h for use with qdisc. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00
John Fastabend	ce679e8df7	net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then called independently for enqueue and dequeue operations. However statistics are aggregated and pushed up to the "master" qdisc. This patch adds support for any of the sub-qdiscs to be per cpu statistic qdiscs. To handle this case add a check when calculating stats and aggregate the per cpu stats if needed. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-12-08 13:32:26 -05:00

1 2 3 4 5 ...

722916 Commits