When doing large parallel file creates on a 16p machines, large amounts of
time is being spent in _xfs_buf_find(). A system wide profile with perf top
shows this:
1134740.00 19.3% _xfs_buf_find
733142.00 12.5% __ticket_spin_lock
The problem is that the hash contains 45,000 buffers, and the hash table width
is only 256 buffers. That means we've got around 200 buffers per chain, and
searching it is quite expensive. The hash table size needs to increase.
Secondly, every time we do a lookup, we promote the buffer we find to the head
of the hash chain. This is causing cachelines to be dirtied and causes
invalidation of cachelines across all CPUs that may have walked the hash chain
recently. hence every walk of the hash chain is effectively a cold cache walk.
Remove the promotion to avoid this invalidation.
The results are:
1045043.00 21.2% __ticket_spin_lock
326184.00 6.6% _xfs_buf_find
A 70% drop in the CPU usage when looking up buffers. Unfortunately that does
not result in an increase in performance underthis workload as contention on
the inode_lock soaks up most of the reduction in CPU usage.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The 7th entry in a lot of evergreen i2c gpio tables is partially
zeroed. Fix the entry.
Should fix the missing ddc entry in:
https://bugs.freedesktop.org/show_bug.cgi?id=29255
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Packets entering GRO might have different headrooms, even for a given
flow (because of implementation details in drivers, like copybreak).
We cant force drivers to deliver packets with a fixed headroom.
1) fix skb_segment()
skb_segment() makes the false assumption headrooms of fragments are same
than the head. When CHECKSUM_PARTIAL is used, this can give csum_start
errors, and crash later in skb_copy_and_csum_dev()
2) allocate a minimal skb for head of frag_list
skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to
allocate a fresh skb. This adds NET_SKB_PAD to a padding already
provided by netdevice, depending on various things, like copybreak.
Use alloc_skb() to allocate an exact padding, to reduce cache line
needs:
NET_SKB_PAD + NET_IP_ALIGN
bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
Many thanks to Plamen Petrov, testing many debugging patches !
With help of Jarek Poplawski.
Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a similar vain to commit 17762060c2
("bridge: Clear IPCB before possible entry into IP stack")
Any time we call into the IP stack we have to make sure the state
there is as expected by the ipv4 code.
With help from Eric Dumazet and Herbert Xu.
Reported-by: Bandan Das <bandan.das@stratus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tv parameter was added to disable the tv-out connector,
however, it caused a crash if it was set to 0 due to
drm_connector_init not getting called. If tv=0, don't
attempt to add the connector.
Might fix:
https://bugzilla.kernel.org/show_bug.cgi?id=17241
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There has been periodic evidence that LVDS, on at least some
panels, prefers the dividers selected by the legacy pll algo.
This patch forces the use of the legacy pll algo on RV515
LVDS panels. The old behavior (new pll algo) can be selected
by setting the new_pll module parameter to 1.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
This code was originally for forcing some clocks on certain asics.
However, this code was later moved to asic specific functions
for all of the affected asics. The only users of the original
code at this point were r600, rv770, and evergreen and the code
was not relevant for those asics. So, remove it.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
vortex_ioctl() was grabbing vortex_private::lock around its call to
generic_mii_ioctl(). This is no longer necessary since there are more
specific locks which the mdio_{read,write}() functions will obtain.
Worse, those functions do not save and restore IRQ flags when locking
the MII state, so interrupts will be enabled when generic_mii_ioctl()
returns.
Since there is currently no need for any function to call
mdio_{read,write}() while holding another spinlock, do not change them
to save and restore IRQ flags but remove the specification of ordering
between vortex_private::lock and vortex_private::mii_lock.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dlpar code can cause a deadlock to occur when making the RTAS
configure-connector call. This occurs because we make kmalloc calls,
which can block, while parsing the rtas_data_buf and holding the
rtas_data_buf_lock. This an cause issues if someone else attempts
to grab the rtas_data_bug_lock.
This patch alleviates this issue by copying the contents of the rtas_data_buf
to a local buffer before parsing. This allows us to only hold the
rtas_data_buf_lock around the RTAS configure-connector calls.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
fixes the warning:
.config:369:warning: symbol value '' invalid for ZRELADDR
and the prompt for ZRELADDR on make
Signed-off-by: Erik Gilling <konkers@android.com>
There's something very important I forgot to tell you.
What?
Don't cross the GRO streams.
Why?
It would be bad.
I'm fuzzy on the whole good/bad thing. What do you mean, "bad"?
Try to imagine all the Internet as you know it stopping instantaneously
and every bit in every packet swapping at the speed of light.
Total packet reordering.
Right. That's bad. Okay. All right. Important safety tip. Thanks, Hubert
The simplest way to stop this is just avoid doing GRO on the second port.
Very few Marvell boards support two ports per ring, and GRO is just
an optimization.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attached is a small patch to remove a warning ("warning: ISO C90 forbids
mixed declarations and code" with gcc 4.3.2).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes init_vf() function, so on each new backlog period parent's
cl_cfmin is properly updated (including further propgation towards the root),
even if the activated leaf has no upperlimit curve defined.
Signed-off-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: David S. Miller <davem@davemloft.net>
mdiobus resources must be released on exit
Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Acked-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reviewing commit 1c40be12f7, I
audited other users of tc_action_ops->dump for information leaks.
That commit covered almost all of them but act_police still had a leak.
opt.limit and opt.capab aren't zeroed out before the structure is
passed out.
This patch uses the C99 initializers to zero everything unused out.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Its currently illegal to call kthread_stop(NULL)
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change just add the IBM eHEA 10Gb network drivers as supported.
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When compiling alpha generic build get errors such as:
arch/alpha/kernel/err_marvel.c: In function ‘marvel_print_err_cyc’:
arch/alpha/kernel/err_marvel.c:119: error: format ‘%ld’ expects type ‘long int’, but argument 6 has type ‘u64’
Replaced a number of %ld format specifiers with %lld since u64
is unsigned long long.
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Updates the Alpha perf_event code to match the changes
recently made to the core perf_event code in commit
e78505958c.
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Matt Turner <mattst88@gmail.com>
This patch fixes the failure to compile Alpha Generic because of
previously overlooked calls to ns87312_enable_ide(). The function has
been replaced by newer SuperIO code.
Tested-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Morten H. Larsen <m-larsen@post6.tele.dk>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Let's use the standard L1_CACHE_ALIGN macro instead.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Matt Turner <mattst88@gmail.com>
This is needed for proper PCI-E support on P1021 SoCs.
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add a call to of_node_put in the error handling code following a call to
of_find_compatible_node.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r exists@
local idexpression x;
expression E,E1;
statement S;
@@
*x =
(of_find_node_by_path
|of_find_node_by_name
|of_find_node_by_phandle
|of_get_parent
|of_get_next_parent
|of_get_next_child
|of_find_compatible_node
|of_match_node
)(...);
...
if (x == NULL) S
<... when != x = E
*if (...) {
... when != of_node_put(x)
when != if (...) { ... of_node_put(x); ... }
(
return <+...x...+>;
|
* return ...;
)
}
...>
of_node_put(x);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The function of_iomap returns the result of calling ioremap, so iounmap
should be called on the result in the error handling code, as done in the
normal exit of the function.
The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r exists@
local idexpression x;
expression E,E1;
identifier l;
statement S;
@@
*x = of_iomap(...);
... when != iounmap(x)
when != if (...) { ... iounmap(x); ... }
when != E = x
when any
(
if (x == NULL) S
|
if (...) {
... when != iounmap(x)
when != if (...) { ... iounmap(x); ... }
(
return <+...x...+>;
|
* return ...;
)
}
)
... when != x = E1
when any
iounmap(x);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Fixes the following compile problem on E500 platforms:
arch/powerpc/sysdev/fsl_rio.c: In function 'fsl_rio_mcheck_exception':
arch/powerpc/sysdev/fsl_rio.c:248: error: 'MCSR_MASK' undeclared (first use in this function)
Also fixes the compile problem on non-E500 platforms.
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The 5 GHz CTL indexes were not being read for all hardware
devices due to the masking out through the CTL_MODE_M mask
being one bit too short. Without this the calibrated regulatory
maximum values were not being picked up when devices operate
on 5 GHz in HT40 mode. The final output power used for Atheros
devices is the minimum between the calibrated CTL values and
what CRDA provides.
Cc: stable@kernel.org [2.6.27+]
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The EEPROM is compressed on AR9003, upon decompression
the wrong upper limit was being used for the block which
prevented the 5 GHz CTL indexes from being used, which are
stored towards the end of the EEPROM block. This fix allows
the actual intended regulatory limits to be used on AR9003
hardware.
Cc: stable@kernel.org [2.6.36+]
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Once we started enforcing the a nl_table[] entry exist for
a protocol, NETLINK_USERSOCK stopped working. Add a dummy
table entry so that it works again.
Reported-by: Thomas Voegtle <tv@lio96.de>
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
arch/powerpc/platforms/85xx/p1022_ds.c:22:23: error: linux/lmb.h: No such file or directory
arch/powerpc/platforms/85xx/p1022_ds.c: In function 'p1022_ds_setup_arch':
arch/powerpc/platforms/85xx/p1022_ds.c💯 error: implicit declaration of function 'memblock_end_of_DRAM'
arch/powerpc/platforms/85xx/p1022_ds.c: At top level:
arch/powerpc/platforms/85xx/p1022_ds.c:147: error: 'udbg_progress' undeclared here (not in a function)
make[2]: *** [arch/powerpc/platforms/85xx/p1022_ds.o] Error 1
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Commit 99d8238f berobbed the for_each loop of its iterator! Let's be
nice and give it back, so it compiles for us.
CC: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
alloc_mayday_mask() was using alloc_cpumask_var() making
gcwq->mayday_mask contain garbage after initialization on
CONFIG_CPUMASK_OFFSTACK=y configurations. This combined with the
previously fixed GCWQ_DISASSOCIATED initialization bug could make
rescuers fall into infinite loop trying to bind to an offline cpu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: CAI Qian <caiqian@redhat.com>
init_workqueues() incorrectly marks workqueues for all possible CPUs
associated. Combined with mayday_mask initialization bug, this can
make rescuers keep trying to bind to an offline gcwq indefinitely.
Fix init_workqueues() such that only online CPUs have their gcwqs have
GCWQ_DISASSOCIATED cleared.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: CAI Qian <caiqian@redhat.com>
If irda_open_tsap() fails, the irda_bind() code tries to destroy
the ->ias_obj object by hand, but does so wrongly.
In particular, it fails to a) release the hashbin attached to the
object and b) reset the self->ias_obj pointer to NULL.
Fix both problems by using irias_delete_object() and explicitly
setting self->ias_obj to NULL, just as irda_release() does.
Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In f761622e59 we changed
early_setup_secondary so it's called using the proper kernel stack
rather than the emergency one.
Unfortunately, this stack pointer can't be used when translation is off
on PHYP as this stack pointer might be outside the RMO. This results in
the following on all non zero cpus:
cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10]
pc: 000000000001c50c
lr: 000000000000821c
sp: c00000001639ff90
msr: 8000000000001000
dar: c00000001639ffa0
dsisr: 42000000
current = 0xc000000016393540
paca = 0xc000000006e00200
pid = 0, comm = swapper
The original patch was only tested on bare metal system, so it never
caught this problem.
This changes __secondary_start so that we calculate the new stack
pointer but only start using it after we've called early_setup_secondary.
With this patch, the above problem goes away.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 0fe1ac48 ("powerpc/perf_event: Fix oops due to
perf_event_do_pending call") moved the call to perf_event_do_pending
in timer_interrupt() down so that it was after the irq_enter() call.
Unfortunately this moved it after the code that checks whether it
is time for the next decrementer clock event. The result is that
the call to perf_event_do_pending() won't happen until the next
decrementer clock event is due. This was pointed out by Milton
Miller.
This fixes it by moving the check for whether it's time for the
next decrementer clock event down to the point where we're about
to call the event handler, after we've called perf_event_do_pending.
This has the side effect that on old pre-Core99 Powermacs where we
use the ppc_n_lost_interrupts mechanism to replay interrupts, a
replayed interrupt will incur a little more latency since it will
now do the code from the irq_enter down to the irq_exit, that it
used to skip. However, these machines are now old and rare enough
that this doesn't matter. To make it clear that ppc_n_lost_interrupts
is only used on Powermacs, and to speed up the code slightly on
non-Powermac ppc32 machines, the code that tests ppc_n_lost_interrupts
is now conditional on CONFIG_PMAC as well as CONFIG_PPC32.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Call kexec purgatory code correctly. We were getting lucky before.
If you examine the powerpc 32bit kexec "purgatory" code you will
see it expects the following:
>From kexec-tools: purgatory/arch/ppc/v2wrap_32.S
-> calling convention:
-> r3 = physical number of this cpu (all cpus)
-> r4 = address of this chunk (master only)
As such, we need to set r3 to the current core, r4 happens to be
unused by purgatory at the moment but we go ahead and set it
here as well
Signed-off-by: Matthew McClintock <msm@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Wireless extensions have an unfortunate, undocumented
requirement which requires drivers to always fill
iwp->length when returning a successful status. When
a driver doesn't do this, it leads to a kernel heap
content leak when userspace offers a larger buffer
than would have been necessary.
Arguably, this is a driver bug, as it should, if it
returns 0, fill iwp->length, even if it separately
indicated that the buffer contents was not valid.
However, we can also at least avoid the memory content
leak if the driver doesn't do this by setting the iwp
length to max_tokens, which then reflects how big the
buffer is that the driver may fill, regardless of how
big the userspace buffer is.
To illustrate the point, this patch also fixes a
corresponding cfg80211 bug (since this requirement
isn't documented nor was ever pointed out by anyone
during code review, I don't trust all drivers nor
all cfg80211 handlers to implement it correctly).
Cc: stable@kernel.org [all the way back]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The new workqueue changes helped me find this bug
that's been lingering since the changes to the work
processing in mac80211 -- the work timer is never
deleted properly. Do that to avoid having it fire
after all data structures have been freed. It can't
be re-armed because all it will do, if running, is
schedule the work, but that gets flushed later and
won't have anything to do since all work items are
gone by now (by way of interface removal).
Cc: stable@kernel.org [2.6.34+]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Michael reported that p54* never really entered power
save mode, even tough it was enabled.
It turned out that upon a power save mode change the
firmware will set a special flag onto the last outgoing
frame tx status (which in this case is almost always the
designated PSM nullfunc frame). This flag confused the
driver; It erroneously reported transmission failures
to the stack, which then generated the next nullfunc.
and so on...
Cc: <stable@kernel.org>
Reported-by: Michael Buesch <mb@bu3sch.de>
Tested-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This avoids a NULL pointer dereference as reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=625889
When the WARN condition is hit in ieee80211_get_tx_rate, it will return
NULL. So, we need to check the return value and avoid dereferencing it
in that case.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: stable@kernel.org
Acked-by: Bob Copeland <me@bobcopeland.com>
p9_client_walk() can return error values if we run out of space or there
is a problem with the network.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
When an erroneous PEB is scheduling for scrubbing, we end up with the
following oops:
[<c0162404>] (prot_queue_del+0x0/0x50) from [<c01635b4>] (ubi_wl_scrub_peb+0xec/0x13c)
[<c01634c8>] (ubi_wl_scrub_peb+0x0/0x13c) from [<c01603bc>] (ubi_eba_read_leb+0x200/0x428)
[<c01601bc>] (ubi_eba_read_leb+0x0/0x428) from [<c015e3c0>] (ubi_leb_read+0xe8/0x138)
[<c015e2d8>] (ubi_leb_read+0x0/0x138) from [<c00d6918>] (ubifs_start_scan+0x7c/0xf4)
[<c00d689c>] (ubifs_start_scan+0x0/0xf4) from [<c00e3650>] (ubifs_recover_leb+0x3c/0x730)
[<c00e3614>] (ubifs_recover_leb+0x0/0x730) from [<c00e444c>] (ubifs_recover_log_leb+0xc8/0x2dc)
[<c00e4384>] (ubifs_recover_log_leb+0x0/0x2dc) from [<c00d7c20>] (ubifs_replay_journal+0xb90/0x13a4)
[<c00d7090>] (ubifs_replay_journal+0x0/0x13a4) from [<c00cdd68>] (ubifs_fill_super+0xb84/0x1054)
[<c00cd1e4>] (ubifs_fill_super+0x0/0x1054) from [<c00ced04>] (ubifs_get_sb+0xc4/0x2ac)
[<c00cec40>] (ubifs_get_sb+0x0/0x2ac) from [<c007f04c>] (vfs_kern_mount+0x58/0x94)
[<c007eff4>] (vfs_kern_mount+0x0/0x94) from [<c007f0e8>] (do_kern_mount+0x40/0xe8)
[<c007f0a8>] (do_kern_mount+0x0/0xe8) from [<c0095628>] (do_new_mount+0x68/0x8c)
[<c00955c0>] (do_new_mount+0x0/0x8c) from [<c00957a8>] (do_mount+0x15c/0x1b8)
[<c009564c>] (do_mount+0x0/0x1b8) from [<c0095890>] (sys_mount+0x8c/0xd4)
[<c0095804>] (sys_mount+0x0/0xd4) from [<c0023c00>] (ret_fast_syscall+0x0/0x2c)
Kernel panic - not syncing: Fatal exception
The problem is that 'ubi_wl_scrub_peb()' does not expect that PEBs may
be in the erroneous tree, which is a bug. This patch fixes the bug
and adds corresponding check to 'ubi_wl_scrub_peb()'. Now it will simply
ignore erroneous PEBs, instead of causing an oops.
Reported-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
MD_CHANGE_CLEAN is used for two different purposes and this leads to
confusion.
One of the purposes is largely mirrored by MD_CHANGE_PENDING which is
not used for anything else, so have MD_CHANGE_PENDING take over that
purpose fully.
The two purposes are:
1/ tell md_update_sb that an update is needed and that it is just a
clean/dirty transition.
2/ tell user-space that an transition from clean to dirty is pending
(something wants to write), and tell te kernel (by clearin the
flag) that the transition is OK.
The first purpose remains wit MD_CHANGE_CLEAN, the second is moved
fully to MD_CHANGE_PENDING.
This means that various places which conditionally set or cleared
MD_CHANGE_CLEAN no longer need to be conditional.
Signed-off-by: NeilBrown <neilb@suse.de>