Change event list format for user readability. perf probe --list
shows event list in "[GROUP:EVENT] EVENT-DEFINITION" format, but
this format is different from the output of perf-list, and
EVENT-DEFINITION is a bit blunt. This patch changes the format to
more user friendly one.
Before:
[probe:schedule_0] schedule+10 prev cpu
After:
probe:schedule_0 (on schedule+10 with prev cpu)
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
LKML-Reference: <20091208220240.10142.42916.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Fri, 2009-12-04 at 20:59 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2009-12-04 at 10:34 +0100, Jean Delvare wrote:
> > I've sent it to linuxppc-dev@ozlabs.org on October 14th. This is the
> > address which is listed 22 times in MAINTAINERS. If it isn't correct,
> > then please update MAINTAINERS.
> No it's fine both shoul work. Your patches are there, just waiting for
> me to pick them up, I was just firing a reminder to the rest of the CC
> list :-) (and I do remember fwd'ing a couple of your patches to the
> list, for some reason they didn't make it to patchwork back then, that
> was a few month ago).
> Anyways, I've been stretched thin with all sort of stuff lately, so bear
> with me if I'm a bit slow at taking or testing stuff, I'm doing my best.
Adding patterns to the PowerPC sections of MAINTAINERS is useful.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment when we EOI an interrupt we set the CPPR back to 0xFF
regardless of its previous value. This could lead to problems if we
take an interrupt with a priority of 5, but before EOIing it we get
an IPI which has a priority of 4. The problem is that at the moment
when we EOI the IPI we will set the CPPR to 0xFF, but it should
really be set back to 5 (the previous priority).
To keep track of the previous CPPR values we create the xics_cppr
structure that has an array for CPPR values and an index pointing
to the current priority. This can easily grow if new priorities get
added in the future.
This will also be useful because the partition adjunct option of
upcoming machines will update the H_XIRR hcall to accept the CPPR
as a parameter.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The recent patch to add cpu offline/online as part of the DLPAR
process for pseries causes a build break if CONFIG_SMP is not
defined. Original patch here;
http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-November/078299.html
This corrects the build break by moving the online_node_cpus
and offline_node_cpus under the #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
portions of dlpar.c.
This patch also slightly modifies the online_node_cpus and offline_node_cpus
routines to prepend dlpar_ to the them and make them static. These two
routine are only used in the dlpar add/remove of cpus and these changes
should help clarify that.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Writing a driver using SCLPC on the MPC5200B I detected, that the
intspec arrays to map irqs to Linux virq cannot be const, because the
mapping and xlate functions only take non const pointers. All those
functions do not modify the intspec, so a const pointer could be used.
Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use symbolic constant for PRESENT and avoid branching.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is no need to do set the DIRTY bit directly in DTLB Error.
Trap to do_page_fault() and let the generic MM code do the work.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that 8xx can fixup dcbX instructions, start using them
where possible like every other PowerPc arch do.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
8xx has not had WRITETHRU due to lack of bits in the pte.
After the recent rewrite of the 8xx TLB code, there are
two bits left. Use one of them to WRITETHRU.
Perhaps use the last SW bit to PAGE_SPECIAL or PAGE_FILE?
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
only DTLB Miss did set this bit, DTLB Error needs too otherwise
the setting is lost when the page becomes dirty.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is an assembler version to fixup DAR not being set
by dcbX, icbi instructions. There are two versions, one
uses selfmodifing code, the other uses a
jump table but is much bigger(default).
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
dcbz, dcbf, dcbi, dcbst and icbi do not set DAR when they
cause a DTLB Error. Dectect this by tagging DAR with 0x00f0
at every exception exit that modifies DAR.
Test for DAR=0x00f0 in DataTLBError and bail
to handle_page_fault().
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update the TLB asm to make proper use of _PAGE_DIRY and _PAGE_ACCESSED.
Get rid of _PAGE_HWWRITE too.
Pros:
- I/D TLB Miss never needs to write to the linux pte.
- _PAGE_ACCESSED is only set on TLB Error fixing accounting
- _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
when a page has been made dirty.
- Proper RO/RW mapping of user space.
- Free up 2 SW TLB bits in the linux pte(add back _PAGE_WRITETHRU ?)
- kernel RO/user NA support.
Cons:
- A few more instructions in the TLB Miss routines.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
8xx sometimes need to load a invalid/non-present TLBs in
it DTLB asm handler.
These must be invalidated separaly as linux mm don't.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently the cpu-allocation/deallocation process comprises of two steps:
- Set the indicators and to update the device tree with DLPAR node
information.
- Online/offline the allocated/deallocated CPU.
This is achieved by writing to the sysfs tunables "probe" during allocation
and "release" during deallocation.
At the sametime, the userspace can independently online/offline the CPUs of
the system using the sysfs tunable "online".
It is quite possible that when a userspace tool offlines a CPU
for the purpose of deallocation and is in the process of updating the device
tree, some other userspace tool could bring the CPU back online by writing to
the "online" sysfs tunable thereby causing the deallocate process to fail.
The solution to this is to serialize writes to the "probe/release" sysfs
tunable with the writes to the "online" sysfs tunable.
This patch employs a mutex to provide this serialization, which is a no-op on
all architectures except PPC_PSERIES
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently the cpu-allocation/deallocation on pSeries is a
two step process from the Userspace.
- Set the indicators and update the device tree by writing to the sysfs
tunable "probe" during allocation and "release" during deallocation.
- Online / Offline the CPUs of the allocated/would_be_deallocated node by
writing to the sysfs tunable "online".
This patch adds kernel code to online/offline the CPUs soon_after/just_before
they have been allocated/would_be_deallocated. This way, the userspace tool
that performs DLPAR operations would only have to deal with one set of sysfs
tunables namely "probe" and release".
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Remove the CPU from the online map to prevent smp_call_function
from sending messages to a stopped CPU.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds the specific routines to probe and release (add and remove)
cpu resource for the powerpc pseries platform and registers these handlers
with the ppc_md callout structure.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Version 3 of this patch is updated with documentation added to
Documentation/ABI. There are no changes to any of the C code from v2
of the patch.
In order to support kernel DLPAR of CPU resources we need to provide an
interface to add (probe) and remove (release) the resource from the system.
This patch Creates new generic probe and release sysfs files to facilitate
cpu probe/release. The probe/release interface provides for allowing each
arch to supply their own routines for implementing the backend of adding
and removing cpus to/from the system.
This also creates the powerpc specific stubs to handle the arch callouts
from writes to the sysfs files.
The creation and use of these files is regulated by the
CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
capability will have the files created.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The Dynamic Logical Partitioning capabilities of the powerpc pseries platform
allows for the addition and removal of resources (i.e. CPU's, memory, and PCI
devices) from a partition. The removal of a resource involves
removing the resource's node from the device tree and then returning the
resource to firmware via the rtas set-indicator call. To add a resource, it
is first obtained from firmware via the rtas set-indicator call and then a
new device tree node is created using the ibm,configure-coinnector rtas call
and added to the device tree.
This patch provides the kernel DLPAR infrastructure in a new filed named
dlpar.c. The functionality provided is for acquiring and releasing a resource
from firmware and the parsing of information returned from the
ibm,configure-connector rtas call. Additionally this exports the pSeries
reconfiguration notifier chain so that it can be invoked when device tree
updates are made.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In commit 0512a9a8e2, we unilaterally zero the
"pwm invert" bit in the fan behavior configuration register. On my PowerBook
G4, this results in the fans going to full speed at low temperature and
shutting off at high temperature because the pwm invert bit is supposed to be
set.
Therefore, record the pwm invert bit at driver load time, and write the bit
into the fan behavior control register. This restores correct behavior on my
PBG4 and should work around the bit being set to the wrong value after
suspend/resume (which is what the original patch was trying to fix). It also
fixes a minor omission where the pwm invert bit correction is NOT performed
when switching into automatic mode.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
CC: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This allows offb to be used for initial framebuffer,
and a kms driver to take over later in the boot sequence.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a libata driver for the "macio" IDE controller used on most Apple
PowerMac and PowerBooks. It's a libata equivalent of drivers/ide/ppc/pmac.c
It supports all the features of its predecessor, including mediabay hotplug
and suspend/resume. It should also support module load/unload.
The timing calculations have been simplified to use pre-calculated tables
compared to drivers/ide/pmac.c and it uses the new mediabay interface
provided by a previous patch.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Tejun Heo <tj@kernel.org>
In libata-sff, ata_sff_post_internal_cmd() directly calls ata_bmdma_stop()
instead of ap->ops->bmdma_stop(). This can be a problem for controllers
that use their own bmdma_stop for which the generic sff one isn't suitable
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
The hotplug mediabay has tendrils deep into drivers/ide code
which makes a libata port reather difficult. In addition it's
ugly and could be done better.
This reworks the interface between the mediabay and the rest
of the world so that:
- Any macio_driver can now have a mediabay_event callback
which will be called when that driver sits on a mediabay and
it's been either plugged or unplugged. The device type is
passed as an argument. We can now move all the IDE cruft
into the IDE driver itself
- A check_media_bay() function can be used to take a peek
at the type of device currently in the bay if any, a cleaner
variant of the previous function with the same name.
- A pair of lock/unlock functions are exposed to allow the
IDE driver to block the hotplug callbacks during the initial
setup and probing of the bay in order to avoid nasty race
conditions.
- The mediabay code no longer needs to spin on the status
register of the IDE interface when it detects an IDE device,
this is done just fine by the IDE code itself
Overall, less code, simpler, and allows for another driver
than our old drivers/ide based one.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
About 50% of shutdowns of b44 Ethernet adapter ends by kernel panic
with kernels compiled with stack-protector.
Checking b44_magic_pattern() return values, one call of
b44_magic_pattern() returns 127. It means, that set_bit(128, pmask)
was called on line 1509. It means that bit 0 of 17th byte of pmask was
overwritten. But pmask has only 16 bytes. Stack corruption happens.
It seems that set_bit() on line 1509 always writes one bit off.
The fix does not only solve the stack corruption, but also makes Wake
On LAN working on my onboard B44 on Asus A7V-333X mainboard.
It seems that this problem affects all kernel versions since commit
725ad800 ([PATCH] b44: add wol for old nic) on 2006-06-20.
Signed-off-by: Stanislav Brabec <sbrabec@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
It can happen, that tcp_retransmit_skb fails due to some error.
In such cases we might end up into a state where tp->retrans_out
is zero but that's only because we removed the TCPCB_SACKED_RETRANS
bit from a segment but couldn't retransmit it because of the error
that happened. Therefore some assumptions that retrans_out checks
are based do not necessarily hold, as there still can be an old
retransmission but that is only visible in TCPCB_EVER_RETRANS bit.
As retransmission happen in sequential order (except for some very
rare corner cases), it's enough to check the head skb for that bit.
Main reason for all this complexity is the fact that connection dying
time now depends on the validity of the retrans_stamp, in particular,
that successive retransmissions of a segment must not advance
retrans_stamp under any conditions. It seems after quick thinking that
this has relatively low impact as eventually TCP will go into CA_Loss
and either use the existing check for !retrans_stamp case or send a
retransmission successfully, setting a new base time for the dying
timer (can happen only once). At worst, the dying time will be
approximately the double of the intented time. In addition,
tcp_packet_delayed() will return wrong result (has some cc aspects
but due to rarity of these errors, it's hardly an issue).
One of retrans_stamp clearing happens indirectly through first going
into CA_Open state and then a later ACK lets the clearing to happen.
Thus tcp_try_keep_open has to be modified too.
Thanks to Damian Lukowski <damian@tvk.rwth-aachen.de> for hinting
that this possibility exists (though the particular case discussed
didn't after all have it happening but was just a debug patch
artifact).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves retransmits_timed_out() from include/net/tcp.h
to tcp_timer.c, where it is used.
Reported-by: Frederic Leroy <fredo@starox.org>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a problem in the TCP connection timeout calculation.
Currently, timeout decisions are made on the basis of the current
tcp_time_stamp and retrans_stamp, which is usually set at the first
retransmission.
However, if the retransmission fails in tcp_retransmit_skb(),
retrans_stamp is not updated and remains zero. This leads to wrong
decisions in retransmits_timed_out() if tcp_time_stamp is larger than
the specified timeout, which is very likely.
In this case, the TCP connection dies after the first attempted
(and unsuccessful) retransmission.
With this patch, tcp_skb_cb->when is used instead, when retrans_stamp
is not available.
This bug has been introduced together with retransmits_timed_out() in
2.6.32, as the number of retransmissions has been used for timeout
decisions before. The corresponding commit was
6fa12c8503 (Revert Backoff [v3]:
Calculate TCP's connection close threshold as a time value.).
Thanks to Ilpo Järvinen for code suggestions and Frederic Leroy for
testing.
Reported-by: Frederic Leroy <fredo@starox.org>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
use common_task instead of reset_task and link_chg_task, so it fix "call cancel_work_sync
from the work itself".
Signed-off-by: Jie Yang <jie.yang@atheros.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add pci map direction in atl1c_buffer flags, it is used when call pci_unmap
apis.
Signed-off-by: Jie Yang <jie.yang@atheros.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unified firmware image may not contain MN type of firmware.
Driver should fall back to NOMN firmware type instead
of going to flash.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o netif_running() check for enabling interrupt at end of napi poll is
not enough to cover firmwar recovery. Instead test __NX_DEV_UP bit.
o Avoid re-entry into to netxen_nic_down() with __NX_DEV_UP bit check.
Acked-by: Dhananjay Phadke <dhananjay.phadke@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o To prevent race conditions with other reset events.
During suspend/resume and firmware recovery, acquire rtnl_lock,
while changing interface state.
Acked-by: Dhananjay Phadke <dhananjay.phadke@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various additions and improvements to the Gigaset driver's README
file, and added comments to its userspace visible include file.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
When built with debugging support, the Gigaset driver enabled some
debugging messages by default. Change the default to "all off".
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the locking scheme on the fec_mpc52xx driver. This device can
receive IRQs from three sources; the FEC itself, the tx DMA, and the
rx DMA. Mutual exclusion was handled by taking a spin_lock() in the
critical regions, but because the handlers are run with IRQs enabled,
spin_lock() is insufficient and the driver can end up interrupting
a critical region anyway from another IRQ.
Asier Llano discovered that this occurs when an error IRQ is raised
in the middle of handling rx irqs which resulted in an sk_buff memory
leak.
In addition, locking is spotty at best in the driver and inspection
revealed quite a few places with insufficient locking.
This patch is based on Asier's initial work, but reworks a number of
things so that locks are held for as short a time as possible, so
that spin_lock_irqsave() is used everywhere, and so the locks are
dropped when calling into the network stack (because the lock only
protects the hardware interface; not the network stack).
Boot tested on a lite5200 with an NFS root. Has not been performance
tested.
Signed-off-by: Asier Llano <a.llano@ziv.es>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a more effective rss hash by default (src + dst, rather than just
src).
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
in routed mode, we don't have a hardware address so netdev_ops doesnt
need to validate our hardware address via .ndo_validate_addr
Reported-by: Manuel Fuentes <mfuentes@agenciaefe.com>
Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
due to reference counting sk_wmem_alloc now has a value of 1 when all
the outstanding data has been sent.
Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix oops when initializing lane interfaces. lec should probably be
changed to use alloc_netdev() instead.
Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds kerneldoc for inet_twsk_unhash() & inet_twsk_bind_unhash().
With help from Randy Dunlap.
Suggested-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we find a timewait connection in __inet_hash_connect() and reuse
it for a new connection request, we have a race window, releasing bind
list lock and reacquiring it in __inet_twsk_kill() to remove timewait
socket from list.
Another thread might find the timewait socket we already chose, leading to
list corruption and crashes.
Fix is to remove timewait socket from bind list before releasing the bind lock.
Note: This problem happens if sysctl_tcp_tw_reuse is set.
Reported-by: kapil dakhane <kdakhane@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>