Commit Graph

737144 Commits

Author SHA1 Message Date
Linus Torvalds
cc006a2241 platform-drivers-x86 for v4.16-2
DEFINE_SHOW_ATTRIBUTE() macro defined privately in 3 locations is
 useful for new and old users to avoid a lot of code duplication.
 
 Move the macro to seq_file.h.
 
 Along with above, clean up 3 drivers to use that macro.
 
 The following is an automated git shortlog grouped by driver:
 
 dell-laptop:
  -  Re-use DEFINE_SHOW_ATTRIBUTE() macro
 
 ideapad-laptop:
  -  Re-use DEFINE_SHOW_ATTRIBUTE() macro
 
 samsung-laptop:
  -  Re-use DEFINE_SHOW_ATTRIBUTE() macro
 
 seq_file:
  -  Introduce DEFINE_SHOW_ATTRIBUTE() helper macro
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEhiZOUlnC9oKN3n3AmT3/83c5Sy0FAlp7AFIACgkQmT3/83c5
 Sy3LFQ//QZQcURDdVA0Kr98tHBdZuwA3VnfQ92WTmEOScflSoiaTidi5L/Pg+J/I
 g04T8UA/8I6UsvOzT/ULOkrSCj6CawAWeZCBiYaESrhM8sD29msxycPEf9wgPV2L
 iDzxtsGDYFiXm25Zd61Tdgv24NqLXDhv0C+MMF35N8RRzo4mxMy6CU7ZqNUZ2U8L
 eO7fdj0K6tXgOUNE90xziV/Rwwja3Rh9clb3l1AOK/l0uaWjaTfPeJJJGsG6aMAo
 upT+YauvzAxLm6ViWQn+kM29qvgXPPhSrxhlW0yOf5jHcSnpi5hCiqG0CypJJuBE
 Vo1bC7oHpep1Inn2eN8mz6qaidvHzqi/0nmzr4yu+QHQM+iC5vhlH2mw8oGoVw5O
 1znT6UX3BV4oRpbmSmYWseUnj6tV8vkaQ/394B86VtdVkfTV9J0VxIQfyf10I6Tq
 wK0aqvOvMqBi0PJdP1yhTnZ3SKRDRkRbTXUIkrnVvLyKku/SA+uA/yTZZD5BW0hI
 RY7s1//jiXpoHzKD0sV2kXMjbaOdhBYJMCtH9e31MwMCxSN8bSbA+CyC4vtWM6uC
 UWfnOtTs2ujilmZEmsh9zfX4m4DM4Li6YxrkC2enCqkwT5FRnnqxgnYQsKUMwbiz
 petClTONFKPDED8gP1kRgpVj+tQLrj/FFiSSwncZ73njIoU6Ooc=
 =jXCu
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.16-2' of git://git.infradead.org/linux-platform-drivers-x86

Pull more x86 platform-drivers updates from Andy Shevchenko:
 "The DEFINE_SHOW_ATTRIBUTE() macro was defined privately in three
  locations and is useful for new and old users to avoid a lot of code
  duplication.

  Move the macro to seq_file.h.

  Along with above, clean up three drivers to use that macro.

  This, due to dependencies, was sent separately since affected changes
  weren't upstream originally yet. The rationale of doing this now is to
  allow use of new macro in v4.17 cycle in a conflictless manner"

* tag 'platform-drivers-x86-v4.16-2' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: samsung-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro
  platform/x86: ideapad-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro
  platform/x86: dell-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro
  seq_file: Introduce DEFINE_SHOW_ATTRIBUTE() helper macro
2018-02-07 12:47:23 -08:00
Jani Nikula
6dd3104e78 drm/i915/bios: add DP max link rate to VBT child device struct
Update VBT defs to reflect revision 216. While at it, default the
expected child device struct size to sizeof the size rather than a
hardcoded value.

v2: Fix bit order (David)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180118153310.32437-1-jani.nikula@intel.com
(cherry picked from commit c4fb60b9ab)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-02-07 12:32:14 -08:00
Linus Torvalds
7590e37bda ASoC: Updates for v4.16
With the merge window having been delayed for another week here's
 another batch of updates that came in during that week.  There's a few
 important fixes in here, mainly a fix for I/O on a number of devices
 caused by some of the component rework and a fix for a potential issue
 if more than one component in a link provides compressed operations.
 The I/O fixes are particularly important as the problem causes a power
 regression on a number of OMAP platforms.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAlp65E4THGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0Oy1B/9XQd6twiSRfNKtbdujOyvjc8lpZ01n
 JYHVySXyL9ZqAnIblM7Or/iXVEtsAVcXgFegF3SLHAY6VF7DQ1pDolnxUtuXOxOj
 j+/2Y4wnDGCjXEr0tMoxrbNUIqlVZLCpPwsPo3vvVbr6sLLmQYVposNp2A2sK2bz
 uWm9E3Nr26Q0UctzjWQM5+AFHSouyL7zDPfBCoWkEToP7163w6r4JDr991KdNGwP
 Ac+5qjRUSldsn8WB2ngm8ioqbq+aOvsz2THYjG8gxrlQK+BWsyCDqF7f1d9GWse2
 7k+xZLrdJrVkBMOnpvOx/Y4KRfe9BAFZBZ3KRbi2IR++7TD3902xEX27
 =aIRm
 -----END PGP SIGNATURE-----

Merge tag 'asoc-v4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound

Pull more ASoC updates from Mark Brown:
 "With the merge window having been delayed for another week here's
  another batch of updates that came in during that week.

  There's a few important fixes in here, mainly a fix for I/O on a
  number of devices caused by some of the component rework and a fix for
  a potential issue if more than one component in a link provides
  compressed operations. The I/O fixes are particularly important as the
  problem causes a power regression on a number of OMAP platforms"

* tag 'asoc-v4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound: (22 commits)
  ASoC: stm32: add of dependency for stm32 drivers
  ASoC: mt8173-rt5650: fix child-node lookup
  ASoC: dapm: fix debugfs read using path->connected
  ASoC: compress: Fixup error messages
  ASoC: compress: Remove some extraneous blank lines
  ASoC: compress: Correct handling of copy callback
  ASoC: Intel: kbl: Enable mclk and ssp sclk early
  ASoC: Intel: Skylake: Add extended I2S config blob support in Clock driver
  ASoC: Intel: Skylake: Add ssp clock driver
  ASoC: Fix twl4030 and 6040 regression by adding back read and write
  ASoC: sun8i-codec: Add ADC support for a33
  ASoC: rockchip: Use dummy_dai for rt5514 dsp dailink
  ASoC: soc-pcm: rename .pmdown_time to .use_pmdown_time for Component
  ASoC: ak4613: call dummy write for PW_MGMT1/3 when Playback
  ASoC: soc-pcm: don't call flush_delayed_work() many times in soc_pcm_private_free()
  ASoC: soc-core: snd_soc_rtdcom_lookup() cares component driver name
  ASoC: sam9x5_wm8731: Drop 'ASoC' prefix from error messages
  ASoC: sam9g20_wm8731: use dev_*() logging functions
  ASoC: max98373 Changed SPDX header in C++ comments style
  ASoC: dmic: Fix check of return value from read of 'num-channels'
  ...
2018-02-07 12:11:09 -08:00
Dave Airlie
2dd27794b9 Merge branch 'drm-next-4.16' of git://people.freedesktop.org/~agd5f/linux into drm-next
A few more misc fixes for 4.16.

* 'drm-next-4.16' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: re-enable CGCG on CZ and disable on ST
  drm/amdgpu: disable coarse grain clockgating for ST
  drm/radeon: adjust tested variable
  drm/amdgpu: remove WARN_ON when VM isn't found v2
  drm/amdgpu: fix locking in vega10_ih_prescreen_iv
  drm/amdgpu: fix another potential cause of VM faults
  drm/amdgpu: use queue 0 for kiq ring
  drm/ttm: Fix 'buf' pointer update in ttm_bo_vm_access_kmap() (v2)
  drm/ttm: fix missing parameter change for ttm_bo_cleanup_refs
2018-02-08 06:05:52 +10:00
Linus Torvalds
7e6127c124 linux-watchdog 4.16-rc1 merge window tag
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iEYEABECAAYFAlp4I7MACgkQ+iyteGJfRsp34QCgi3O78Sajso9iJNMj5KJsQTEt
 VOsAn1ioW2jO9CmZQoj4IlVStlKU0NCN
 =jeHv
 -----END PGP SIGNATURE-----

Merge tag 'linux-watchdog-4.16-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

 - new watchdog device drivers for Realtek RTD1295 and Spreadtrum SC9860
   platform

 - add support for the following devices: jz4780 SoC, AST25xx series SoC
   and r8a77970 SoC

 - convert to watchdog framework: i6300esb_wdt, xen_wdt and sp5100_tco

 - several fixes for watchdog core

 - remove at32ap700x and obsolete documentation

 - gpio: Convert to use GPIO descriptors

 - rename gemini into FTWDT010 as this IP block is generc from Faraday
   Technology

 - various clean-ups and small bugfixes

 - add Guenter Roeck as co-maintainer

 - change maintainers e-mail address

* tag 'linux-watchdog-4.16-rc1' of git://www.linux-watchdog.org/linux-watchdog: (74 commits)
  documentation: watchdog: remove documentation of w83697hf_wdt/w83697ug_wdt
  documentation: watchdog: remove documentation for ixp2000
  documentation: watchdog: remove documentation of at32ap700x_wdt
  watchdog: remove at32ap700x_wdt
  watchdog: sp5100_tco: Add support for recent FCH versions
  watchdog: sp5100-tco: Abort if watchdog is disabled by hardware
  watchdog: sp5100_tco: Use bit operations
  watchdog: sp5100_tco: Convert to use watchdog subsystem
  watchdog: sp5100_tco: Clean up function and variable names
  watchdog: sp5100_tco: Use dev_ print functions where possible
  watchdog: sp5100_tco: Match PCI device early
  watchdog: sp5100_tco: Clean up sp5100_tco_setupdevice
  watchdog: sp5100_tco: Use standard error codes
  watchdog: sp5100_tco: Use request_muxed_region where possible
  watchdog: sp5100_tco: Fix watchdog disable bit
  watchdog: sp5100_tco: Always use SP5100_IO_PM_{INDEX_REG,DATA_REG}
  watchdog: core: make sure the watchdog_worker is not deferred
  watchdog: mt7621: switch to using managed devm_watchdog_register_device()
  watchdog: mt7621: set WDOG_HW_RUNNING bit when appropriate
  watchdog: imx2_wdt: restore previous timeout after suspend+resume
  ...
2018-02-07 11:54:34 -08:00
Tang Junhui
73ac105be3 bcache: fix for data collapse after re-attaching an attached device
back-end device sdm has already attached a cache_set with ID
f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with
another cache set, and it returns with an error:
[root]# cd /sys/block/sdm/bcache
[root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f > attach
-bash: echo: write error: Invalid argument

After that, execute a command to modify the label of bcache
device:
[root]# echo data_disk1 > label

Then we reboot the system, when the system power on, the back-end
device can not attach to cache_set, a messages show in the log:
Feb  5 12:05:52 ceph152 kernel: [922385.508498] bcache:
bch_cached_dev_attach() couldn't find uuid for sdm in set

In sysfs_attach(), dc->sb.set_uuid was assigned to the value
which input through sysfs, no matter whether it is success
or not in bch_cached_dev_attach(). For example, If the back-end
device has already attached to an cache set, bch_cached_dev_attach()
would fail, but dc->sb.set_uuid was changed. Then modify the
label of bcache device, it will call bch_write_bdev_super(),
which would write the dc->sb.set_uuid to the super block, so we
record a wrong cache set ID in the super block, after the system
reboot, the cache set couldn't find the uuid of the back-end
device, so the bcache device couldn't exist and use any more.

In this patch, we don't assigned cache set ID to dc->sb.set_uuid
in sysfs_attach() directly, but input it into bch_cached_dev_attach(),
and assigned dc->sb.set_uuid to the cache set ID after the back-end
device attached to the cache set successful.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07 12:50:01 -07:00
Tang Junhui
7f4fc93d47 bcache: return attach error when no cache set exist
I attach a back-end device to a cache set, and the cache set is not
registered yet, this back-end device did not attach successfully, and no
error returned:
[root]# echo 87859280-fec6-4bcc-20df7ca8f86b > /sys/block/sde/bcache/attach
[root]#

In sysfs_attach(), the return value "v" is initialized to "size" in
the beginning, and if no cache set exist in bch_cache_sets, the "v" value
would not change any more, and return to sysfs, sysfs regard it as success
since the "size" is a positive number.

This patch fixes this issue by assigning "v" with "-ENOENT" in the
initialization.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07 12:50:01 -07:00
Coly Li
7a5e3ecbe5 bcache: set writeback_rate_update_seconds in range [1, 60] seconds
dc->writeback_rate_update_seconds can be set via sysfs and its value can
be set to [1, ULONG_MAX].  It does not make sense to set such a large
value, 60 seconds is long enough value considering the default 5 seconds
works well for long time.

Because dc->writeback_rate_update is a special delayed work, it re-arms
itself inside the delayed work routine update_writeback_rate(). When
stopping it by cancel_delayed_work_sync(), there should be a timeout to
wait and make sure the re-armed delayed work is stopped too. A small max
value of dc->writeback_rate_update_seconds is also helpful to decide a
reasonable small timeout.

This patch limits sysfs interface to set dc->writeback_rate_update_seconds
in range of [1, 60] seconds, and replaces the hand-coded number by macros.

Changelog:
v2: fix a rebase typo in v4, which is pointed out by Michael Lyle.
v1: initial version.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07 12:50:01 -07:00
Tang Junhui
682811b3ce bcache: fix for allocator and register thread race
After long time running of random small IO writing,
I reboot the machine, and after the machine power on,
I found bcache got stuck, the stack is:
[root@ceph153 ~]# cat /proc/2510/task/*/stack
[<ffffffffa06b2455>] closure_sync+0x25/0x90 [bcache]
[<ffffffffa06b6be8>] bch_journal+0x118/0x2b0 [bcache]
[<ffffffffa06b6dc7>] bch_journal_meta+0x47/0x70 [bcache]
[<ffffffffa06be8f7>] bch_prio_write+0x237/0x340 [bcache]
[<ffffffffa06a8018>] bch_allocator_thread+0x3c8/0x3d0 [bcache]
[<ffffffff810a631f>] kthread+0xcf/0xe0
[<ffffffff8164c318>] ret_from_fork+0x58/0x90
[<ffffffffffffffff>] 0xffffffffffffffff
[root@ceph153 ~]# cat /proc/2038/task/*/stack
[<ffffffffa06b1abd>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
[<ffffffffa06b1bd1>] bch_btree_insert+0xf1/0x170 [bcache]
[<ffffffffa06b637f>] bch_journal_replay+0x13f/0x230 [bcache]
[<ffffffffa06c75fe>] run_cache_set+0x79a/0x7c2 [bcache]
[<ffffffffa06c0cf8>] register_bcache+0xd48/0x1310 [bcache]
[<ffffffff812f702f>] kobj_attr_store+0xf/0x20
[<ffffffff8125b216>] sysfs_write_file+0xc6/0x140
[<ffffffff811dfbfd>] vfs_write+0xbd/0x1e0
[<ffffffff811e069f>] SyS_write+0x7f/0xe0
[<ffffffff8164c3c9>] system_call_fastpath+0x16/0x1
The stack shows the register thread and allocator thread
were getting stuck when registering cache device.

I reboot the machine several times, the issue always
exsit in this machine.

I debug the code, and found the call trace as bellow:
register_bcache()
   ==>run_cache_set()
      ==>bch_journal_replay()
         ==>bch_btree_insert()
            ==>__bch_btree_map_nodes()
               ==>btree_insert_fn()
                  ==>btree_split() //node need split
                     ==>btree_check_reserve()
In btree_check_reserve(), It will check if there is enough buckets
of RESERVE_BTREE type, since allocator thread did not work yet, so
no buckets of RESERVE_BTREE type allocated, so the register thread
waits on c->btree_cache_wait, and goes to sleep.

Then the allocator thread initialized, the call trace is bellow:
bch_allocator_thread()
==>bch_prio_write()
   ==>bch_journal_meta()
      ==>bch_journal()
         ==>journal_wait_for_write()
In journal_wait_for_write(), It will check if journal is full by
journal_full(), but the long time random small IO writing
causes the exhaustion of journal buckets(journal.blocks_free=0),
In order to release the journal buckets,
the allocator calls btree_flush_write() to flush keys to
btree nodes, and waits on c->journal.wait until btree nodes writing
over or there has already some journal buckets space, then the
allocator thread goes to sleep. but in btree_flush_write(), since
bch_journal_replay() is not finished, so no btree nodes have journal
(condition "if (btree_current_write(b)->journal)" never satisfied),
so we got no btree node to flush, no journal bucket released,
and allocator sleep all the times.

Through the above analysis, we can see that:
1) Register thread wait for allocator thread to allocate buckets of
   RESERVE_BTREE type;
2) Alloctor thread wait for register thread to replay journal, so it
   can flush btree nodes and get journal bucket.
   then they are all got stuck by waiting for each other.

Hua Rui provided a patch for me, by allocating some buckets of
RESERVE_BTREE type in advance, so the register thread can get bucket
when btree node splitting and no need to waiting for the allocator
thread. I tested it, it has effect, and register thread run a step
forward, but finally are still got stuck, the reason is only 8 bucket
of RESERVE_BTREE type were allocated, and in bch_journal_replay(),
after 2 btree nodes splitting, only 4 bucket of RESERVE_BTREE type left,
then btree_check_reserve() is not satisfied anymore, so it goes to sleep
again, and in the same time, alloctor thread did not flush enough btree
nodes to release a journal bucket, so they all got stuck again.

So we need to allocate more buckets of RESERVE_BTREE type in advance,
but how much is enough?  By experience and test, I think it should be
as much as journal buckets. Then I modify the code as this patch,
and test in the machine, and it works.

This patch modified base on Hua Rui’s patch, and allocate more buckets
of RESERVE_BTREE type in advance to avoid register thread and allocate
thread going to wait for each other.

[patch v2] ca->sb.njournal_buckets would be 0 in the first time after
cache creation, and no journal exists, so just 8 btree buckets is OK.

Signed-off-by: Hua Rui <huarui.dev@gmail.com>
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07 12:50:01 -07:00
Coly Li
7ba0d830dc bcache: set error_limit correctly
Struct cache uses io_errors for two purposes,
- Error decay: when cache set error_decay is set, io_errors is used to
  generate a small piece of delay when I/O error happens.
- I/O errors counter: in order to generate big enough value for error
  decay, I/O errors counter value is stored by left shifting 20 bits (a.k.a
  IO_ERROR_SHIFT).

In function bch_count_io_errors(), if I/O errors counter reaches cache set
error limit, bch_cache_set_error() will be called to retire the whold cache
set. But current code is problematic when checking the error limit, see the
following code piece from bch_count_io_errors(),

 90     if (error) {
 91             char buf[BDEVNAME_SIZE];
 92             unsigned errors = atomic_add_return(1 << IO_ERROR_SHIFT,
 93                                                 &ca->io_errors);
 94             errors >>= IO_ERROR_SHIFT;
 95
 96             if (errors < ca->set->error_limit)
 97                     pr_err("%s: IO error on %s, recovering",
 98                            bdevname(ca->bdev, buf), m);
 99             else
100                     bch_cache_set_error(ca->set,
101                                         "%s: too many IO errors %s",
102                                         bdevname(ca->bdev, buf), m);
103     }

At line 94, errors is right shifting IO_ERROR_SHIFT bits, now it is real
errors counter to compare at line 96. But ca->set->error_limit is initia-
lized with an amplified value in bch_cache_set_alloc(),
1545         c->error_limit  = 8 << IO_ERROR_SHIFT;

It means by default, in bch_count_io_errors(), before 8<<20 errors happened
bch_cache_set_error() won't be called to retire the problematic cache
device. If the average request size is 64KB, it means bcache won't handle
failed device until 512GB data is requested. This is too large to be an I/O
threashold. So I believe the correct error limit should be much less.

This patch sets default cache set error limit to 8, then in
bch_count_io_errors() when errors counter reaches 8 (if it is default
value), function bch_cache_set_error() will be called to retire the whole
cache set. This patch also removes bits shifting when store or show
io_error_limit value via sysfs interface.

Nowadays most of SSDs handle internal flash failure automatically by LBA
address re-indirect mapping. If an I/O error can be observed by upper layer
code, it will be a notable error because that SSD can not re-indirect
map the problematic LBA address to an available flash block. This situation
indicates the whole SSD will be failed very soon. Therefore setting 8 as
the default io error limit value makes sense, it is enough for most of
cache devices.

Changelog:
v2: add reviewed-by from Hannes.
v1: initial version for review.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Cc: Junhui Tang <tang.junhui@zte.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07 12:50:01 -07:00
Coly Li
99361bbf26 bcache: properly set task state in bch_writeback_thread()
Kernel thread routine bch_writeback_thread() has the following code block,

447         down_write(&dc->writeback_lock);
448~450     if (check conditions) {
451                 up_write(&dc->writeback_lock);
452                 set_current_state(TASK_INTERRUPTIBLE);
453
454                 if (kthread_should_stop())
455                         return 0;
456
457                 schedule();
458                 continue;
459         }

If condition check is true, its task state is set to TASK_INTERRUPTIBLE
and call schedule() to wait for others to wake up it.

There are 2 issues in current code,
1, Task state is set to TASK_INTERRUPTIBLE after the condition checks, if
   another process changes the condition and call wake_up_process(dc->
   writeback_thread), then at line 452 task state is set back to
   TASK_INTERRUPTIBLE, the writeback kernel thread will lose a chance to be
   waken up.
2, At line 454 if kthread_should_stop() is true, writeback kernel thread
   will return to kernel/kthread.c:kthread() with TASK_INTERRUPTIBLE and
   call do_exit(). It is not good to enter do_exit() with task state
   TASK_INTERRUPTIBLE, in following code path might_sleep() is called and a
   warning message is reported by __might_sleep(): "WARNING: do not call
   blocking ops when !TASK_RUNNING; state=1 set at [xxxx]".

For the first issue, task state should be set before condition checks.
Ineed because dc->writeback_lock is required when modifying all the
conditions, calling set_current_state() inside code block where dc->
writeback_lock is hold is safe. But this is quite implicit, so I still move
set_current_state() before all the condition checks.

For the second issue, frankley speaking it does not hurt when kernel thread
exits with TASK_INTERRUPTIBLE state, but this warning message scares users,
makes them feel there might be something risky with bcache and hurt their
data.  Setting task state to TASK_RUNNING before returning fixes this
problem.

In alloc.c:allocator_wait(), there is also a similar issue, and is also
fixed in this patch.

Changelog:
v3: merge two similar fixes into one patch
v2: fix the race issue in v1 patch.
v1: initial buggy fix.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Cc: Michael Lyle <mlyle@lyle.org>
Cc: Junhui Tang <tang.junhui@zte.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07 12:50:01 -07:00
Tang Junhui
c4dc2497d5 bcache: fix high CPU occupancy during journal
After long time small writing I/O running, we found the occupancy of CPU
is very high and I/O performance has been reduced by about half:

[root@ceph151 internal]# top
top - 15:51:05 up 1 day,2:43,  4 users,  load average: 16.89, 15.15, 16.53
Tasks: 2063 total,   4 running, 2059 sleeping,   0 stopped,   0 zombie
%Cpu(s):4.3 us, 17.1 sy 0.0 ni, 66.1 id, 12.0 wa,  0.0 hi,  0.5 si,  0.0 st
KiB Mem : 65450044 total, 24586420 free, 38909008 used,  1954616 buff/cache
KiB Swap: 65667068 total, 65667068 free,        0 used. 25136812 avail Mem

  PID USER PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 2023 root 20  0       0      0      0 S 55.1  0.0   0:04.42 kworker/11:191
14126 root 20  0       0      0      0 S 42.9  0.0   0:08.72 kworker/10:3
 9292 root 20  0       0      0      0 S 30.4  0.0   1:10.99 kworker/6:1
 8553 ceph 20  0 4242492 1.805g  18804 S 30.0  2.9 410:07.04 ceph-osd
12287 root 20  0       0      0      0 S 26.7  0.0   0:28.13 kworker/7:85
31019 root 20  0       0      0      0 S 26.1  0.0   1:30.79 kworker/22:1
 1787 root 20  0       0      0      0 R 25.7  0.0   5:18.45 kworker/8:7
32169 root 20  0       0      0      0 S 14.5  0.0   1:01.92 kworker/23:1
21476 root 20  0       0      0      0 S 13.9  0.0   0:05.09 kworker/1:54
 2204 root 20  0       0      0      0 S 12.5  0.0   1:25.17 kworker/9:10
16994 root 20  0       0      0      0 S 12.2  0.0   0:06.27 kworker/5:106
15714 root 20  0       0      0      0 R 10.9  0.0   0:01.85 kworker/19:2
 9661 ceph 20  0 4246876 1.731g  18800 S 10.6  2.8 403:00.80 ceph-osd
11460 ceph 20  0 4164692 2.206g  18876 S 10.6  3.5 360:27.19 ceph-osd
 9960 root 20  0       0      0      0 S 10.2  0.0   0:02.75 kworker/2:139
11699 ceph 20  0 4169244 1.920g  18920 S 10.2  3.1 355:23.67 ceph-osd
 6843 ceph 20  0 4197632 1.810g  18900 S  9.6  2.9 380:08.30 ceph-osd

The kernel work consumed a lot of CPU, and I found they are running journal
work, The journal is reclaiming source and flush btree node with surprising
frequency.

Through further analysis, we found that in btree_flush_write(), we try to
get a btree node with the smallest fifo idex to flush by traverse all the
btree nodein c->bucket_hash, after we getting it, since no locker protects
it, this btree node may have been written to cache device by other works,
and if this occurred, we retry to traverse in c->bucket_hash and get
another btree node. When the problem occurrd, the retry times is very high,
and we consume a lot of CPU in looking for a appropriate btree node.

In this patch, we try to record 128 btree nodes with the smallest fifo idex
in heap, and pop one by one when we need to flush btree node. It greatly
reduces the time for the loop to find the appropriate BTREE node, and also
reduce the occupancy of CPU.

[note by mpl: this triggers a checkpatch error because of adjacent,
pre-existing style violations]

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07 12:50:01 -07:00
Tang Junhui
a728eacbbd bcache: add journal statistic
Sometimes, Journal takes up a lot of CPU, we need statistics
to know what's the journal is doing. So this patch provide
some journal statistics:
1) reclaim: how many times the journal try to reclaim resource,
   usually the journal bucket or/and the pin are exhausted.
2) flush_write: how many times the journal try to flush btree node
   to cache device, usually the journal bucket are exhausted.
3) retry_flush_write: how many times the journal retry to flush
   the next btree node, usually the previous tree node have been
   flushed by other thread.
we show these statistic by sysfs interface. Through these statistics
We can totally see the status of journal module when the CPU is too
high.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-07 12:50:01 -07:00
Linus Torvalds
413879a10b RISC-V changes for 4.16
This tag contains the fixes we'd like to target for the 4.16 merge
 window.  It's not as much as I was originally hoping to do but between
 glibc, the chip, and FOSDEM there just wasn't enough time to get
 everything put together.  As such, this merge window is essentially just
 going to be small changes.  This includes mostly cleanups:
 
 * A build fix failure to the audit test cases.  RISC-V doesn't have
   renameat because the generic syscall ABI moved to renameat2 by the
   time of our port.  The syscall audit test cases don't understand this,
   so I added a trivial fix.  This went through mailing list review
   during the 4.15 merge window, but nobody has picked it up so I think
   it's best to just do this here.
 * The removal of our command-line argument processing code.  The
   "mem_end" stuff was broken and the rest duplicated generic device tree
   code.  The generic code was already being called.
 * Some unused/redundant code has been removed, including
   __ARCH_HAVE_MMU, current_pgdir, and the initialization of init_mm.pgd.
 * SUM is disabled upon taking a trap, which means that user memory is
   protected during traps taking inside copy_{to,from}_user().
 * The sptbr CSR has been renamed to satp in C code.  We haven't changed
   the assembly code in order to maintain compatibility with binutils
   2.29, which doesn't understand the new name.
 
 Additionally, we're adding some new features:
 
 * Basic ftrace support, thanks to Alan Kao!
 * Support for ZONE_DMA32.  This is necessary for all the normal reasons,
   but also to deal with a deficiency in the Xilinx PCIe controller we're
   using on our FPGA-based systems.  While the ZONE_DMA32 addition should
   be sufficient for most uses, it doesn't complete the fix for the
   Xilinx controller.
 * TLB shootdowns now only target the harts where they're necessary,
   instead of applying to all harts in the system.
 
 These patches have all been sitting on our linux-next branch for a while
 now.  Due to time constraints this is all I feel comfortable submitting
 during the 4.16 merge window, hopefully we'll do better next time!
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlp7N2gTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQX8kD/4xxw6TuuESmDXxAQPQ+S8J98uKRfAF
 9kMMzJJARcW5sT1vo3pKpE8+Ss0Hy2fIcaYsw5Je/Yl7vdAy/Dk7X3/mx7mxf5BP
 8m2cSd7DFLLLhntZTbr1Y5fJ6awFLtzI46zn/SzTdTatLWKXNLS5wmPKE33ddq/C
 iTi4k/as8E/vuNtuPy1GsOF0gICpZ2xB4YoMwTgWfpxTekBkUktO3EOHmZTwQEEM
 U1muB+4WoqusbBt6cP3Q7cUF3b6aMVSevWnywZGkD+yWOGRXTVzMgT7R4YlKEOre
 OQypZocYUbRmZQMZACKpgHIcOZpePaSTIQ2zzhXEPVGB0XAHtMRnAaVtwPxwG6c4
 EThDCN9ldShutKqT4XilHrh5gf0sy7qG0PIidPhMmXH9LCeTSAU4VdISJP1jkq19
 chiMHlf6+/DhikyiH0+lK/MX8vQMt6UJL1SlRKO/c2FxxKAZKnENJ+tuAlkAlwoC
 gnvZsE5BUYw1ptRHXR0d5C4m8M2M9LPZfpWYcg+1mRO9EA+kt0XCupL7RsrdFuoa
 FCVEhP/JMaiX0JtmAHfVIU0yNGjH3b5xi3FoGk2Aoj/c8O3F5YcwT5C5nO+jpv32
 n9vyMR20/721+yA2dFIlq4DnelwdZczOTqrcDYJrLxXzk8OXUFFffbe4kbDCxp34
 WniBxwnY9BF25g==
 =cNRH
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-4.16-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux

Pull RISC-V updates from Palmer Dabbelt:
 "This contains the fixes we'd like to target for the 4.16 merge window.
  It's not as much as I was originally hoping to do but between glibc,
  the chip, and FOSDEM there just wasn't enough time to get everything
  put together. As such, this merge window is essentially just going to
  be small changes. This includes mostly cleanups:

   - A build fix failure to the audit test cases.

     RISC-V doesn't have renameat because the generic syscall ABI moved
     to renameat2 by the time of our port. The syscall audit test cases
     don't understand this, so I added a trivial fix. This went through
     mailing list review during the 4.15 merge window, but nobody has
     picked it up so I think it's best to just do this here.

   - The removal of our command-line argument processing code. The
     "mem_end" stuff was broken and the rest duplicated generic device
     tree code. The generic code was already being called.

   - Some unused/redundant code has been removed, including
     __ARCH_HAVE_MMU, current_pgdir, and the initialization of
     init_mm.pgd.

   - SUM is disabled upon taking a trap, which means that user memory is
     protected during traps taking inside copy_{to,from}_user().

   - The sptbr CSR has been renamed to satp in C code. We haven't
     changed the assembly code in order to maintain compatibility with
     binutils 2.29, which doesn't understand the new name.

  Additionally, we're adding some new features:

   - Basic ftrace support, thanks to Alan Kao!

   - Support for ZONE_DMA32.

     This is necessary for all the normal reasons, but also to deal with
     a deficiency in the Xilinx PCIe controller we're using on our
     FPGA-based systems. While the ZONE_DMA32 addition should be
     sufficient for most uses, it doesn't complete the fix for the
     Xilinx controller.

   - TLB shootdowns now only target the harts where they're necessary,
     instead of applying to all harts in the system.

  These patches have all been sitting on our linux-next branch for a
  while now. Due to time constraints this is all I feel comfortable
  submitting during the 4.16 merge window, hopefully we'll do better
  next time!"

[ Note to self: "harts" is RISC-V speak for "hardware threads".  I had
  to look that up.    - Linus ]

* tag 'riscv-for-linus-4.16-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  riscv: inline set_pgdir into its only caller
  riscv: rename sptbr to satp
  riscv: don't read back satp in paging_init
  riscv: remove the unused current_pgdir function
  riscv: add ZONE_DMA32
  RISC-V: Limit the scope of TLB shootdowns
  riscv: disable SUM in the exception handler
  riscv: remove redundant unlikely()
  riscv: remove unused __ARCH_HAVE_MMU define
  riscv/ftrace: Add basic support
  RISC-V: Remove mem_end command line processing
  RISC-V: Remove duplicate command-line parsing logic
  audit: Avoid build failures on systems without renameat
2018-02-07 11:33:08 -08:00
Linus Torvalds
0bd2afc748 MIPS fixes for 4.16-rc1
A couple of MIPS fixes for 4.16-rc1, including an important regression
 in 4.15 and a rather more longstanding corner case build fix.
 
 - Fix CPS regression on older binutils due to MIPS_ISA_LEVEL_RAW fix
   (4.15)
 - Fix allmodconfig + CONFIG_MACH_TX49XX=y builds due to incorrect use of
   IS_ENABLED() (2.6.28)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEd80NauSabkiESfLYbAtpk944dnoFAlp6/k8ACgkQbAtpk944
 dnoIjxAAqvovuVTJfTVn2MwDwIagfOOZo2PASvWrt5YRnPyCapFZEgPsDh0Qv8ca
 dLugAMbz3uTk0xy+xzoKtbUowFJK65G15xn7a+UyHKFrEGflHd6lgb7SvGTNMWqJ
 ru7xo/Plk8zdrF6NCjAh1a3vSn1aYEIBjb4pjai9TH8cNXFfPjlOvcxKUj7MqRZQ
 /IyDAfWa87NAh8amJKoiCHfQk3u/awu0jn3Vcrjog6kLKDH0sxd09EPIcBkznUl+
 CCO8vlvBvbsaMOV1Dwl6qxFFMQ3/OL+QEe3HrrDM/DURzwWGWnWktC6O9WXqgq8c
 IJ3t84jMX/BoGqybS8rX9Uy+Qr7ieV7lNgSbd3QQYqA8PLPLrp1xqsAcUlXJm4pj
 KVIpJ2bAtJF54y0o4x6KbtiVsjHIoVm9k1ftnGNfcS6HjbCWQgAoccj2HZfIdYaN
 /9pnqU5HYRIOrOp165LgdGOUUotA9JWigco45/ywWrtztAITIh8hFR4IiIXqfl6L
 xbfl8dsjQTuGBIjtwNI8PjKbeD8Dhz2/bEEj+2YmwtTI/l/iIXepTNszWZaE6G03
 f0PfA9XVyej8BFPk/SQQy3rw1nvjWE+aFeKkwEZCwBQea9Nlhyrj/CvBFIagj9rQ
 R6AV4Fn67SyACri8hy90KG+dyfALtppn+2rWLBEWQPcahg9FWds=
 =jeJr
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_4.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips

Pull MIPS fixes from James Hogan:
 "A couple of MIPS fixes for 4.16-rc1, including an important regression
  in 4.15 and a rather more longstanding corner case build fix.

  These are separate from the main pull request as one of the bugs fixed
  was only recently introduced in v4.15-rc8.

   - Fix CPS regression on older binutils due to MIPS_ISA_LEVEL_RAW fix
     (4.15)

   - Fix allmodconfig + CONFIG_MACH_TX49XX=y builds due to incorrect use
     of IS_ENABLED() (2.6.28)"

* tag 'mips_fixes_4.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
  MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
  MIPS: CPS: Fix MIPS_ISA_LEVEL_RAW fallout
2018-02-07 11:31:05 -08:00
Linus Torvalds
8578953687 MIPS changes for 4.16
These are the main MIPS changes for 4.16. Rough overview:
  - Basic support for the Ingenic JZ4770 based GCW Zero open-source
    handheld video game console
  - Support for the Ranchu board (used by Android emulator)
  - Various cleanups and misc improvements
 
 Fixes:
  - Fix generic platform's USB_*HCI_BIG_ENDIAN selects (4.9)
  - Fix vmlinuz default build when ZBOOT selected
  - Fix clean up of vmlinuz targets
  - Fix command line duplication (in preparation for Ingenic JZ4770)
 
 Miscellaneous:
  - Allow Processor ID reads to be to be optimised away by the compiler
    (improves performance when running in guest)
  - Push ARCH_MIGHT_HAVE_PC_SERIO/PARPORT down to platform level to
    disable on generic platform with Ranchu board support
  - Add helpers for assembler macro instructions for older assemblers
  - Use assembler macro instructions to support VZ, XPA & MSA operations
    on older assemblers, removing C wrapper duplication
  - Various improvements to VZ & XPA assembly wrappers
  - Add drivers/platform/mips/ to MIPS MAINTAINERS entry
 
 Minor cleanups:
  - Misc FPU emulation cleanups (removal of unnecessary include, moving
    macros to common header, checkpatch and sparse fixes)
  - Remove duplicate assignment of core in play_dead()
  - Remove duplication in watchpoint handling
  - Remove mips_dma_mapping_error() stub
  - Use NULL instead of 0 in prepare_ftrace_return()
  - Use proper kernel-doc Return keyword for
    __compute_return_epc_for_insn()
  - Remove duplicate semicolon in csum_fold()
 
 Platform support:
 
 Broadcom:
  - Enable ZBOOT on BCM47xx
 
 Generic platform:
  - Add Ranchu board support, used by Android emulator
  - Fix machine compatible string matching for Ranchu
  - Support GIC in EIC mode
 
 Ingenic platforms:
  - Add DT, defconfig and other support for JZ4770 SoC and GCW Zero
  - Support dynamnic machine types (i.e. JZ4740 / JZ4770 / JZ4780)
  - Add Ingenic JZ4770 CGU clocks
  - General Ingenic clk changes to prepare for JZ4770 SoC support
  - Use common command line handling code
  - Add DT vendor prefix to GCW (Game Consoles Worldwide)
 
 Loongson:
  - Add MAINTAINERS entry for Loongson2 and Loongson3 platforms
  - Drop 32-bit support for Loongson 2E/2F devices
  - Fix build failures due to multiple use of "MEM_RESERVED"
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEd80NauSabkiESfLYbAtpk944dnoFAlp64ZUACgkQbAtpk944
 dnrXrg//UPWeZMye/uHw0eEeJJjybyA0IWpJ6M94gbHxpduhQsjYU3CR9U4ZBmhs
 feY53dahh0RCR0k28EF8DEPkoUbGFKmyYCnvqAuatq1XOjAZtlgS9+VVzbK+Iswm
 XkZD1MBoZ49o0meyjQrH/2Ri/t6tHuzo0G2WtRJ8FnVruN9ymG6D5pR4Y31gDucb
 6JkTXjNfRJIKd0qJgP+c3HdlKE7jlnCTJnzHdA+5FbZVwKbm2/6KxbQo5Gc1BXJX
 4j7I4nJ0FIz0cB6fHbcccFSW9w3lPa9bQ4XpYPJYE6a36QldFvMWHRxvI6rxrACN
 5mPqIB9uqvtW8sdUbJtNRXFlNnm8XZzvsNqP6WxGQPW70+q2camni9W/gC1ifQsF
 +uVV54yj3Ky8xQNbbpfbDp/tFXRuLtj3DV4/a3dwA5J0YGEuMn1zzV5WTTzymFVn
 3NKl62LDUlzBNw0d1lUPMY6P1oKcNnRhLxBq0cxaB7AdOLF0jlCQ/wYUhXPpblj6
 CQB4cupR4IMvL7FZ1RS98e1RHaF8mXpaZBnGXT251DxZEre9OXCJxDdzqemedTVi
 SaCcvQqApCQD8OihL+wHZLew8Vp4EvwGAa++Evu/Ot4rWjY/9MGLtewYk8jkOEf6
 qk30dDn86ou29HNwpzfWadIq5Zew+QftifGOzTcuzgrJXXt+jH8=
 =7iwT
 -----END PGP SIGNATURE-----

Merge tag 'mips_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips

Pull MIPS updates from James Hogan:
 "These are the main MIPS changes for 4.16.

  Rough overview:

   (1) Basic support for the Ingenic JZ4770 based GCW Zero open-source
       handheld video game console

   (2) Support for the Ranchu board (used by Android emulator)

   (3) Various cleanups and misc improvements

  More detailed summary:

  Fixes:
   - Fix generic platform's USB_*HCI_BIG_ENDIAN selects (4.9)
   - Fix vmlinuz default build when ZBOOT selected
   - Fix clean up of vmlinuz targets
   - Fix command line duplication (in preparation for Ingenic JZ4770)

  Miscellaneous:
   - Allow Processor ID reads to be to be optimised away by the compiler
     (improves performance when running in guest)
   - Push ARCH_MIGHT_HAVE_PC_SERIO/PARPORT down to platform level to
     disable on generic platform with Ranchu board support
   - Add helpers for assembler macro instructions for older assemblers
   - Use assembler macro instructions to support VZ, XPA & MSA
     operations on older assemblers, removing C wrapper duplication
   - Various improvements to VZ & XPA assembly wrappers
   - Add drivers/platform/mips/ to MIPS MAINTAINERS entry

  Minor cleanups:
   - Misc FPU emulation cleanups (removal of unnecessary include, moving
     macros to common header, checkpatch and sparse fixes)
   - Remove duplicate assignment of core in play_dead()
   - Remove duplication in watchpoint handling
   - Remove mips_dma_mapping_error() stub
   - Use NULL instead of 0 in prepare_ftrace_return()
   - Use proper kernel-doc Return keyword for
     __compute_return_epc_for_insn()
   - Remove duplicate semicolon in csum_fold()

  Platform support:

  Broadcom:
   - Enable ZBOOT on BCM47xx

  Generic platform:
   - Add Ranchu board support, used by Android emulator
   - Fix machine compatible string matching for Ranchu
   - Support GIC in EIC mode

  Ingenic platforms:
   - Add DT, defconfig and other support for JZ4770 SoC and GCW Zero
   - Support dynamnic machine types (i.e. JZ4740 / JZ4770 / JZ4780)
   - Add Ingenic JZ4770 CGU clocks
   - General Ingenic clk changes to prepare for JZ4770 SoC support
   - Use common command line handling code
   - Add DT vendor prefix to GCW (Game Consoles Worldwide)

  Loongson:
   - Add MAINTAINERS entry for Loongson2 and Loongson3 platforms
   - Drop 32-bit support for Loongson 2E/2F devices
   - Fix build failures due to multiple use of 'MEM_RESERVED'"

* tag 'mips_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: (53 commits)
  MIPS: Malta: Sanitize mouse and keyboard configuration.
  MIPS: Update defconfigs after previous patch.
  MIPS: Push ARCH_MIGHT_HAVE_PC_SERIO down to platform level
  MIPS: Push ARCH_MIGHT_HAVE_PC_PARPORT down to platform level
  MIPS: SMP-CPS: Remove duplicate assignment of core in play_dead
  MIPS: Generic: Support GIC in EIC mode
  MIPS: generic: Fix Makefile alignment
  MIPS: generic: Fix ranchu_of_match[] termination
  MIPS: generic: Fix machine compatible matching
  MIPS: Loongson fix name confict - MEM_RESERVED
  MIPS: bcm47xx: enable ZBOOT support
  MIPS: Fix trailing semicolon
  MIPS: Watch: Avoid duplication of bits in mips_read_watch_registers
  MIPS: Watch: Avoid duplication of bits in mips_install_watch_registers.
  MIPS: MSA: Update helpers to use new asm macros
  MIPS: XPA: Standardise readx/writex accessors
  MIPS: XPA: Allow use of $0 (zero) to MTHC0
  MIPS: XPA: Use XPA instructions in assembly
  MIPS: VZ: Pass GC0 register names in $n format
  MIPS: VZ: Update helpers to use new asm macros
  ...
2018-02-07 11:22:44 -08:00
David S. Miller
4d80ecdb80 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for you net tree, they
are:

1) Restore __GFP_NORETRY in xt_table allocations to mitigate effects of
   large memory allocation requests, from Michal Hocko.

2) Release IPv6 fragment queue in case of error in fragmentation header,
   this is a follow up to amend patch 83f1999cae, from Subash Abhinov
   Kasiviswanathan.

3) Flowtable infrastructure depends on NETFILTER_INGRESS as it registers
   a hook for each flowtable, reported by John Crispin.

4) Missing initialization of info->priv in xt_cgroup version 1, from
   Cong Wang.

5) Give a chance to garbage collector to run after scheduling flowtable
   cleanup.

6) Releasing flowtable content on nft_flow_offload module removal is
   not required at all, there is not dependencies between this module
   and flowtables, remove it.

7) Fix missing xt_rateest_mutex grabbing for hash insertions, also from
   Cong Wang.

8) Move nf_flow_table_cleanup() routine to flowtable core, this patch is
   a dependency for the next patch in this list.

9) Flowtable resources are not properly released on removal from the
   control plane. Fix this resource leak by scheduling removal of all
   entries and explicit call to the garbage collector.

10) nf_ct_nat_offset() declaration is dead code, this function prototype
    is not used anywhere, remove it. From Taehee Yoo.

11) Fix another flowtable resource leak on entry insertion failures,
    this patch also fixes a possible use-after-free. Patch from Felix
    Fietkau.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-07 13:55:20 -05:00
Linus Torvalds
e03ab6c4ad A few late-arriving fixes, along with Konstantin's PGP document that had
no reason to wait another cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaejsmAAoJEI3ONVYwIuV6iqIP/2vJ1GfaYwVKScGybqTAGFIj
 5zAwjkd3LhYYYfx5DkEcpAqYllZS/jmj5th5095TxDlDrwfxNwMYBQUP7rYp6s6E
 IupSGq4z/n/6MseeXom6J7IxRaRBbERB7VLb0eAA8UOTis96kgnjuMRB3bCVHlfN
 7VTo8uFm4/ZKKntSq4gl59FnVna4PkZU10TB0tWh617lyPkfJR8FkBcdBvlXbCYt
 03LAkQ5F1XfVMM9gq6F3IGnmPCkIRbzMrXm1tXZNRYFkfu0bw3lN5um0DOLrq4Qw
 NTybo1W5hKf3LQLVJGIDpMzYkrp4GMrBLy6cNaK6V5vHHAmRew+zHBu93dSmzqE+
 hT+Zih6FLXJV2Bc/GnwGiFFSHU44MBVO4DTACRTFAW19gnVaFZe51uCJ+gzwNkhk
 v4c4riGxRAVeZA6FoXXcjGxwjA2qmjMqcjiRHW2pxSHb1Oz+KwQHDd8muS/G0H/a
 CBCwlkfrJHPWVkrYRSuq912feIYMs9I/y/46lIhsowmFYBCF6g1SZ5xk1lggM+uo
 FJp/+m1vQvbWrkotFZjXiv/riWURKTQykSUHA9+cQcU/UIkjfx0YrmGkpfjaM2G8
 QTPq1HU9iRLpXXYrKNnzLEPH4CUU+ffGD8Ou9migxrNQnSut/Dgsii0Eqr/AGNbW
 bChrYwjeT7OAvzW8A5ZZ
 =ffz4
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.16-2' of git://git.lwn.net/linux

Pull more documentation updates from Jonathan Corbet:
 "A few late-arriving fixes, along with Konstantin's PGP document that
  had no reason to wait another cycle"

* tag 'docs-4.16-2' of git://git.lwn.net/linux:
  Documentation/process: tweak pgp maintainer guide
  Documentation/admin-guide: fixes for thunderbolt.rst
  Documentation: mips: Update AU1xxx_IDE Kconfig dependencies
  Fix broken link in Documentation/process/kernel-docs.rst
  Documentation/process: kernel maintainer PGP guide
2018-02-07 09:42:59 -08:00
Steve French
5f60a56494 Add missing structs and defines from recent SMB3.1.1 documentation
The last two updates to MS-SMB2 protocol documentation added various
flags and structs (especially relating to SMB3.1.1 tree connect).
Add missing defines and structs to smb2pdu.h

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-02-07 09:36:46 -06:00
Steve French
f9de151bf2 address lock imbalance warnings in smbdirect.c
Although at least one of these was an overly strict sparse warning
in the new smbdirect code, it is cleaner to fix - so no warnings.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-02-07 09:36:43 -06:00
Arnd Bergmann
ade7db991b cifs: silence compiler warnings showing up with gcc-8.0.0
This bug was fixed before, but came up again with the latest
compiler in another function:

fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA':
fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 4] [-Werror=array-bounds]
   strncpy(parm_data->list[0].name, ea_name, name_len);

Let's apply the same fix that was used for the other instances.

Fixes: b2a3ad9ca5 ("cifs: silence compiler warnings showing up with gcc-4.7.0")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-02-07 09:36:41 -06:00
Steve French
ede2e520a1 Add some missing debug fields in server and tcon structs
Allow dumping out debug information on dialect, signing, unix extensions
and encryption

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-02-07 09:36:38 -06:00
Julia Lawall
a2b0fe7435 coccinelle: deref_null: avoid useless computation
The effect of the rules ifm1, pr11, and pr12 is only used in the final rule,
which depends on context && !org && !report.  Thus these rules should only
be performed in those circumstances.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-08 00:16:12 +09:00
Martin Schwidefsky
f19fbd5ed6 s390: introduce execute-trampolines for branches
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.

With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call

	basr	%r14,%r1

is replaced with a PC relative call to a new thunk

	brasl	%r14,__s390x_indirect_jump_r1

The thunk contains the EXRL/EX instruction to the indirect branch

__s390x_indirect_jump_r1:
	exrl	0,0f
	j	.
0:	br	%r1

The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-07 15:57:02 +01:00
Julia Lawall
e856f3a7d7 coccinelle: devm_free: reduce false positives
Some files use both a non-devm allocation and a devm_allocation.  Don't
complain about a free when the same function contains a non-devm
allocation.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-07 23:53:09 +09:00
Trond Myklebust
2275cde4cc SUNRPC: Queue latency-sensitive socket tasks to xprtiod
The response to a write_space notification is very latency sensitive,
so we should queue it to the lower latency xprtiod_workqueue. This
is something we already do for the other cases where an rpc task
holds the transport XPRT_LOCKED bitlock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-02-07 09:25:52 -05:00
Oleksij Rempel
4e12d654ba ath9k_htc: add Altai WA1011N-GU
as reported in:
https://github.com/qca/open-ath9k-htc-firmware/pull/71#issuecomment-361100070

Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07 16:16:50 +02:00
Yu Wang
50e79e2525 ath10k: fix kernel panic issue during pci probe
If device gone during chip reset, ar->normal_mode_fw.board is not
initialized, but ath10k_debug_print_hwfw_info() will try to access its
member, which will cause 'kernel NULL pointer' issue. This was found
using a faulty device (pci link went down sometimes) in a random
insmod/rmmod/other-op test.
To fix it, check ar->normal_mode_fw.board before accessing the member.

pci 0000:02:00.0: BAR 0: assigned [mem 0xf7400000-0xf75fffff 64bit]
ath10k_pci 0000:02:00.0: enabling device (0000 -> 0002)
ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
ath10k_pci 0000:02:00.0: failed to read device register, device is gone
ath10k_pci 0000:02:00.0: failed to wait for target init: -5
ath10k_pci 0000:02:00.0: failed to warm reset: -5
ath10k_pci 0000:02:00.0: firmware crashed during chip reset
ath10k_pci 0000:02:00.0: firmware crashed! (uuid 5d018951-b8e1-404a-8fde-923078b4423a)
ath10k_pci 0000:02:00.0: (null) target 0x00000000 chip_id 0x00340aff sub 0000:0000
ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 1 testmode 1
ath10k_pci 0000:02:00.0: firmware ver  api 0 features  crc32 00000000
...
BUG: unable to handle kernel NULL pointer dereference at 00000004
...
Call Trace:
 [<fb4e7882>] ath10k_print_driver_info+0x12/0x20 [ath10k_core]
 [<fb62b7dd>] ath10k_pci_fw_crashed_dump+0x6d/0x4d0 [ath10k_pci]
 [<fb629f07>] ? ath10k_pci_sleep.part.19+0x57/0xc0 [ath10k_pci]
 [<fb62c8ee>] ath10k_pci_hif_power_up+0x14e/0x1b0 [ath10k_pci]
 [<c10477fb>] ? do_page_fault+0xb/0x10
 [<fb4eb934>] ath10k_core_register_work+0x24/0x840 [ath10k_core]
 [<c18a00d8>] ? netlbl_unlhsh_remove+0x178/0x410
 [<c10477f0>] ? __do_page_fault+0x480/0x480
 [<c1068e44>] process_one_work+0x114/0x3e0
 [<c1069d07>] worker_thread+0x37/0x4a0
 [<c106e294>] kthread+0xa4/0xc0
 [<c1069cd0>] ? create_worker+0x180/0x180
 [<c106e1f0>] ? kthread_park+0x50/0x50
 [<c18ab4f7>] ret_from_fork+0x1b/0x28
 Code: 78 80 b8 50 09 00 00 00 75 5d 8d 75 94 c7 44 24 08 aa d7 52 fb c7 44 24 04 64 00 00 00
 89 34 24 e8 82 52 e2 c5 8b 83 dc 08 00 00 <8b> 50 04 8b 08 31 c0 e8 20 57 e3 c5 89 44 24 10 8b 83 58 09 00
 EIP: [<fb4e7754>]-
 ath10k_debug_print_board_info+0x34/0xb0 [ath10k_core]
 SS:ESP 0068:f4921d90
 CR2: 0000000000000004

Signed-off-by: Yu Wang <yyuwang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07 16:16:10 +02:00
Wojciech Dubowik
b9607de6cf ath9k: Fix get channel default noise floor
Commit 8da58553cc ("ath9k: Use calibrated noise floor value
when available") introduced regression in ath9k_hw_getchan_noise
where per chain nominal noise floor has been taken instead default
for channel.
Revert to original default channel noise floor.

Fixes: 8da58553cc ("ath9k: Use calibrated noise floor value when available")
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07 16:14:08 +02:00
Tobias Schramm
34f1cb339c ath10k: add support for Ubiquiti rebranded QCA988X v2
Some modern Ubiquiti devices contain a rebranded QCA988X rev2 with
a custom Ubiquiti vendor and device id. This patch adds support for
those devices, treating them as a QCA988X v2.

Signed-off-by: Tobias Schramm <tobleminer@gmail.com>
[kvalo@codeaurora.org: rebase, add missing fields in hw_params, fix a long line in pci.c:61]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07 16:09:44 +02:00
Tobias Schramm
d5cc611193 PCI: Add Ubiquiti Networks vendor ID
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Tobias Schramm <tobleminer@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07 16:09:36 +02:00
Yu Wang
0a7fe71823 ath10k: correct the length of DRAM dump for QCA6174 hw3.x/QCA9377 hw1.1
The length of DRAM dump for QCA6174 hw3.0/hw3.2 and QCA9377 hw1.1
are less than the actual value, some coredump contents are missed.
To fix it, change the length from 0x90000 to 0xa8000.

Fixes: 703f261dd7 ("ath10k: add memory dump support for QCA6174/QCA9377")
Signed-off-by: Yu Wang <yyuwang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07 16:07:35 +02:00
Larry Finger
c713fb071e rtlwifi: rtl8821ae: Fix connection lost problem correctly
There has been a coding error in rtl8821ae since it was first introduced,
namely that an 8-bit register was read using a 16-bit read in
_rtl8821ae_dbi_read(). This error was fixed with commit 40b368af4b
("rtlwifi: Fix alignment issues"); however, this change led to
instability in the connection. To restore stability, this change
was reverted in commit b8b8b16352 ("rtlwifi: rtl8821ae: Fix connection
lost problem").

Unfortunately, the unaligned access causes machine checks in ARM
architecture, and we were finally forced to find the actual cause of the
problem on x86 platforms. Following a suggestion from Pkshih
<pkshih@realtek.com>, it was found that increasing the ASPM L1
latency from 0 to 7 fixed the instability. This parameter was varied to
see if a smaller value would work; however, it appears that 7 is the
safest value. A new symbol is defined for this quantity, thus it can be
easily changed if necessary.

Fixes: b8b8b16352 ("rtlwifi: rtl8821ae: Fix connection lost problem")
Cc: Stable <stable@vger.kernel.org> # 4.14+
Fix-suggested-by: Pkshih <pkshih@realtek.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: James Cameron <quozl@laptop.org>  # x86_64 OLPC NL3
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-02-07 15:38:37 +02:00
Mark Brown
3d0a352a55
Merge remote-tracking branches 'asoc/topic/sam9x5_wm8731', 'asoc/topic/sgtl5000' and 'asoc/topic/sun8i-codec' into asoc-next 2018-02-07 11:25:47 +00:00
Mark Brown
171248777a
Merge remote-tracking branches 'asoc/topic/max98373', 'asoc/topic/mtk', 'asoc/topic/pcm', 'asoc/topic/rockchip' and 'asoc/topic/sam9g20_wm8731' into asoc-next 2018-02-07 11:25:44 +00:00
Mark Brown
535b218a7f
Merge remote-tracking branches 'asoc/topic/ak4613', 'asoc/topic/core', 'asoc/topic/dmic' and 'asoc/topic/intel' into asoc-next 2018-02-07 11:25:41 +00:00
Mark Brown
008a03c0cf
Merge remote-tracking branch 'asoc/topic/compress' into asoc-next 2018-02-07 11:24:21 +00:00
Mark Brown
aa72645481
Merge remote-tracking branch 'asoc/fix/twl-breakage' into asoc-next 2018-02-07 11:23:48 +00:00
Mark Brown
ce8ee02d51
Merge remote-tracking branches 'asoc/fix/compress', 'asoc/fix/core', 'asoc/fix/dapm', 'asoc/fix/mtk' and 'asoc/fix/stm' into asoc-next 2018-02-07 11:21:02 +00:00
Olivier Moysan
276d70f758
ASoC: stm32: add of dependency for stm32 drivers
Add of dependency for STM32 ASoC drivers.
DFSDM of dependency is already inherited
from STM32_DFSDM_ADC dependency.

Signed-off-by: olivier moysan <olivier.moysan@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-02-07 11:19:53 +00:00
Arnd Bergmann
168b6511e8 x86: hibernate: fix swsusp_arch_resume() prototype
The declaration for swsusp_arch_resume() marks it as 'asmlinkage',
but the definition in x86-32 does not, and it fails to include
the header with the declaration.  This leads to a warning when
building with link-time-optimizations:

kernel/power/power.h:108:23: error: type of 'swsusp_arch_resume' does not match original declaration [-Werror=lto-type-mismatch]
 extern asmlinkage int swsusp_arch_resume(void);
                       ^
arch/x86/power/hibernate_32.c:148:0: note: 'swsusp_arch_resume' was previously declared here
 int swsusp_arch_resume(void)

This moves the declaration into a globally visible header file
and fixes up both x86 definitions to match it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07 12:18:23 +01:00
Johan Hovold
3a0a7b2610
ASoC: mt8173-rt5650: fix child-node lookup
This driver used the wrong OF-helper when looking up the optional
capture-codec child node during probe. Instead of searching just
children of the sound node, a tree-wide depth-first search starting at
the unrelated platform node was done. Not only could this end up
matching an unrelated node or no node at all; the platform node could
also be prematurely freed since of_find_node_by_name() drops a reference
to its first argument. This particular pattern has been observed leading
to crashes after probe deferrals in other drivers.

Fix this by dropping the broken call to of_find_node_by_name() and
keeping only the second, correct lookup using of_get_child_by_name()
while taking care not to bail out if the optional node is missing.

Note that this also addresses two capture-codec node-reference leaks
(one for each of the original helper calls).

Compile tested only.

Fixes: d349caeb05 ("ASoC: mediatek: Add second I2S on mt8173-rt5650 machine driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-02-07 11:16:39 +00:00
KaiChieh Chuang
28735af3f5
ASoC: dapm: fix debugfs read using path->connected
This fix a bug in dapm_widget_power_read_file(),
where it may sent opposite order of source/sink widget
into the p->connected().

for example,
static int connected_check(source, sink);
{"w_sink", NULL, "w_source", connected_check}

the dapm_widget_power_read_file() will query p->connected()
in following case
	p->conneted("w_source", "w_sink")
	p->conneted("w_sink", "w_source")
we should avoid the last case, since it's the wrong order (source/sink)
as declared in snd_soc_dapm_route.

Signed-off-by: KaiChieh Chuang <kaichieh.chuang@mediatek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-02-07 11:15:26 +00:00
Ulf Hansson
a3381e3a65 PM / domains: Fix up domain-idle-states OF parsing
Commit b539cc82d4 (PM / Domains: Ignore domain-idle-states that are
not compatible), made it possible to ignore non-compatible
domain-idle-states OF nodes. However, in case that happens while doing
the OF parsing, the number of elements in the allocated array would
exceed the numbers actually needed, thus wasting memory.

Fix this by pre-iterating the genpd OF node and counting the number of
compatible domain-idle-states nodes, before doing the allocation. While
doing this, it makes sense to rework the code a bit to avoid open coding,
of parts responsible for the OF node iteration.

Let's also take the opportunity to clarify the function header for
of_genpd_parse_idle_states(), about what is being returned in case of
errors.

Fixes: b539cc82d4 (PM / Domains: Ignore domain-idle-states that are not compatible)
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07 12:02:01 +01:00
Felix Fietkau
0ff90b6c20 netfilter: nf_flow_offload: fix use-after-free and a resource leak
flow_offload_del frees the flow, so all associated resource must be
freed before.

Since the ct entry in struct flow_offload_entry was allocated by
flow_offload_alloc, it should be freed by flow_offload_free to take care
of the error handling path when flow_offload_add fails.

While at it, make flow_offload_del static, since it should never be
called directly, only from the gc step

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-02-07 11:55:52 +01:00
Taehee Yoo
d8ed960058 netfilter: remove useless prototype
prototype nf_ct_nat_offset is not used anymore.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
2018-02-07 11:54:52 +01:00
Sudeep Holla
eb85c50c24 cpufreq: scpi: fix static checker warning cdev isn't an ERR_PTR
Commit 343a8d17fa (cpufreq: scpi: remove arm_big_little dependency)
leads to the following static checker warning:

	drivers/cpufreq/scpi-cpufreq.c:203 scpi_cpufreq_ready()
	warn: 'cdev' isn't an ERR_PTR

of_cpufreq_cooling_register() returns NULL on error. This patch removes
the incorrect IS_ERR check on the returned pointer.

Fixes: 343a8d17fa (cpufreq: scpi: remove arm_big_little dependency)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-07 11:51:41 +01:00
Andy Shevchenko
9a7c551ba7 platform/x86: samsung-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro
...instead of open coding file operations followed by custom ->open()
callbacks per each attribute.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-07 12:50:23 +02:00
Andy Shevchenko
334c4efd29 platform/x86: ideapad-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro
...instead of open coding file operations followed by custom ->open()
callbacks per each attribute.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-07 12:50:22 +02:00
Andy Shevchenko
b04eb8aaab platform/x86: dell-laptop: Re-use DEFINE_SHOW_ATTRIBUTE() macro
...instead of open coding file operations followed by custom ->open()
callbacks per each attribute.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-02-07 12:50:22 +02:00