Segmentation and Reassembly (SAR) occupies different windows in standard and
extended control fields. Convert hardcoded masks to relative ones and use shift
to access SAR bits.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Supervisory bits occupy different windows in standard / extended control
fields. Convert hardcoded masks to relative ones and use shift to access
S-bit window.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Adds extended control field bit masks and rearrange defines to logical
groups: masks, flags and shift groups.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Adds support for extended window size (EWS) config option. We enable EWS
feature in L2CAP Info RSP when hs enabled. EWS option is included in L2CAP
Config Req if tx_win (which is set via socket) bigger then standard default
value (63) && hs enabled && remote side supports EWS feature.
Using EWS selects extended control field in L2CAP.
Code partly based on Qualcomm and Atheros patches sent upstream a year ago.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch do the following things:
1. Add header and Copyright for marvell usb driver.
2. Add mv_usb.h in include/linux/platform_data, make the driver
fits all the marvell platform using the same ChipIdea usb ip.
3. Some SOC may has mutiple clock sources, so let me define it
in mv_usb_platform_data and give two helper functions named
udc_clock_enable/udc_clock_disable to deal with the clocks.
4. Different SOCs will have some difference in PHY initialization,
so we will remove file mv_udc_phy.c and add two funtions in
mv_usb_platform_data, let the platform relative driver to realize it.
5. Rewrite probe function according to the modification list above. Find
it will kernel panic when probe failed. The root cause is as follows:
When probe failed, the error handle may call device_unregister()
which in return will call gadget_release.In current code,
gadget_release have two issues:
1: the_controller is a NULL pointer.
2: if we free udc here, then the following code in probe
will access NULL pointer.
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
some renesas_usbhs device is supporting OTG external device interface.
In that device, it is necessary to control PWEN/EXTLP on DVSTCTR.
This patch support it.
But renesas_usbhs driver doesn't have OTG support for now.
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
this patch add DVSTCTR control function for HOST support
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
renesas_usbhs will have register DVSTCTR control function for HOST support.
This patch changes usbhsc_bus_ctrl() to usbsc_set_buswait(),
to remove DVSTCTR access from it,
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
SH7757 has a USB function with internal DMA controller (SUDMAC).
This patch supports the SUDMAC. The SUDMAC is incompatible with
general-purpose DMAC. So, it doesn't use dmaengine.
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
This creates a subsystem for handling of pin control devices.
These are devices that control different aspects of package
pins.
Currently it handles pinmuxing, i.e. assigning electronic
functions to groups of pins on primarily PGA and BGA type of
chip packages which are common in embedded systems.
The plan is to also handle other I/O pin control aspects
such as biasing, driving, input properties such as
schmitt-triggering, load capacitance etc within this
subsystem, to remove a lot of ARM arch code as well as
feature-creepy GPIO drivers which are implementing the same
thing over and over again.
This is being done to depopulate the arch/arm/* directory
of such custom drivers and try to abstract the infrastructure
they all need. See the Documentation/pinctrl.txt file that is
part of this patch for more details.
ChangeLog v1->v2:
- Various minor fixes from Joe's and Stephens review comments
- Added a pinmux_config() that can invoke custom configuration
with arbitrary data passed in or out to/from the pinmux driver
ChangeLog v2->v3:
- Renamed subsystem folder to "pinctrl" since we will likely
want to keep other pin control such as biasing in this
subsystem too, so let us keep to something generic even though
we're mainly doing pinmux now.
- As a consequence, register pins as an abstract entity separate
from the pinmux. The muxing functions will claim pins out of the
pin pool and make sure they do not collide. Pins can now be
named by the pinctrl core.
- Converted the pin lookup from a static array into a radix tree,
I agreed with Grant Likely to try to avoid any static allocation
(which is crap for device tree stuff) so I just rewrote this
to be dynamic, just like irq number descriptors. The
platform-wide definition of number of pins goes away - this is
now just the sum total of the pins registered to the subsystem.
- Make sure mappings with only a function name and no device
works properly.
ChangeLog v3->v4:
- Define a number space per controller instead of globally,
Stephen and Grant requested the same thing so now maps need to
define target controller, and the radix tree of pin descriptors
is a property on each pin controller device.
- Add a compulsory pinctrl device entry to the pinctrl mapping
table. This must match the pinctrl device, like "pinctrl.0"
- Split the file core.c in two: core.c and pinmux.c where the
latter carry all pinmux stuff, the core is for generic pin
control, and use local headers to access functionality between
files. It is now possible to implement a "blank" pin controller
without pinmux capabilities. This split will make new additions
like pindrive.c, pinbias.c etc possible for combined drivers
and chunks of functionality which is a GoodThing(TM).
- Rewrite the interaction with the GPIO subsystem - the pin
controller descriptor now handles this by defining an offset
into the GPIO numberspace for its handled pin range. This is
used to look up the apropriate pin controller for a GPIO pin.
Then that specific GPIO range is matched 1-1 for the target
controller instance.
- Fixed a number of review comments from Joe Perches.
- Broke out a header file pinctrl.h for the core pin handling
stuff that will be reused by other stuff than pinmux.
- Fixed some erroneous EXPORT() stuff.
- Remove mispatched U300 Kconfig and Makefile entries
- Fixed a number of review comments from Stephen Warren, not all
of them - still WIP. But I think the new mapping that will
specify which function goes to which pin mux controller address
50% of your concerns (else beat me up).
ChangeLog v4->v5:
- Defined a "position" for each function, so the pin controller now
tracks a function in a certain position, and the pinmux maps define
what position you want the function in. (Feedback from Stephen
Warren and Sascha Hauer).
- Since we now need to request a combined function+position from
the machine mapping table that connect mux settings to drivers,
it was extended with a position field and a name field. The
name field is now used if you e.g. need to switch between two
mux map settings at runtime.
- Switched from a class device to using struct bus_type for this
subsystem. Verified sysfs functionality: seems to work fine.
(Feedback from Arnd Bergmann and Greg Kroah-Hartman)
- Define a per pincontroller list of GPIO ranges from the GPIO
pin space that can be handled by the pin controller. These can
be added one by one at runtime. (Feedback from Barry Song)
- Expanded documentation of regulator_[get|enable|disable|put]
semantics.
- Fixed a number of review comments from Barry Song. (Thanks!)
ChangeLog v5->v6:
- Create an abstract pin group concept that can sort pins into
named and enumerated groups no matter what the use of these
groups may be, one possible usecase is a group of pins being
muxed in or so. The intention is however to also use these
groups for other pin control activities.
- Make it compulsory for pinmux functions to associate with
at least one group, so the abstract pin group concept is used
to define the groups of pins affected by a pinmux function.
The pinmux driver interface has been altered so as to enforce
a function to list applicable groups per function.
- Provide an optional .group entry in the pinmux machine map
so the map can select beteween different available groups
to be used with a certain function.
- Consequent changes all over the place so that e.g. debugfs
present reasonable information about the world.
- Drop the per-pin mux (*config) function in the pinmux_ops
struct - I was afraid that this would start to be used for
things totally unrelated to muxing, we can introduce that to
the generic struct pinctrl_ops if needed. I want to keep
muxing orthogonal to other pin control subjects and not mix
these things up.
ChangeLog v6->v7:
- Make it possible to have several map entries matching the
same device, pin controller and function, but using
a different group, and alter the semantics so that
pinmux_get() will pick all matching map entries, and
store the associated groups in a list. The list will
then be iterated over at pinmux_enable()/pinmux_disable()
and corresponding driver functions called for each
defined group. Notice that you're only allowed to map
multiple *groups* to the same
{ device, pin controller, function } triplet, attempts
to map the same device to multiple pin controllers will
for example fail. This is hopefully the crucial feature
requested by Stephen Warren.
- Add a pinmux hogging field to the pinmux mapping entries,
and enable the pinmux core to hog pinmux map entries.
This currently only works for pinmuxes without assigned
devices as it looks now, but with device trees we can
look up the corresponding struct device * entries when
we register the pinmux driver, and have it hog each
pinmux map in turn, for a simple approach to
non-dynamic pin muxing. This addresses an issue from
Grant Likely that the machine should take care of as
much of the pinmux setup as possible, not the devices.
By supplying a list of hogs, it can now instruct the
core to take care of any static mappings.
- Switch pinmux group retrieveal function to grab an
array of strings representing the groups rather than an
array of unsigned and rewrite accordingly.
- Alter debugfs to show the grouplist handled by each
pinmux. Also add a list of hogs.
- Dynamically allocate a struct pinmux at pinmux_get() and
free it at pinmux_put(), then add these to the global
list of pinmuxes active as we go along.
- Go over the list of pinmux maps at pinmux_get() time
and repeatedly apply matches.
- Retrieve applicable groups per function from the driver
as a string array rather than a unsigned array, then
lookup the enumerators.
- Make the device to pinmux map a singleton - only allow the
mapping table to be registered once and even tag the
registration function with __init so it surely won't be
abused.
- Create a separate debugfs file to view the pinmux map at
runtime.
- Introduce a spin lock to the pin descriptor struct, lock it
when modifying pin status entries. Reported by Stijn Devriendt.
- Fix up the documentation after review from Stephen Warren.
- Let the GPIO ranges give names as const char * instead of some
fixed-length string.
- add a function to unregister GPIO ranges to mirror the
registration function.
- Privatized the struct pinctrl_device and removed it from the
<linux/pinctrl/pinctrl.h> API, the drivers do not need to know
the members of this struct. It is now in the local header
"core.h".
- Rename the concept of "anonymous" mux maps to "system" muxes
and add convenience macros and documentation.
ChangeLog v7->v8:
- Delete the leftover pinmux_config() function from the
<linux/pinctrl/pinmux.h> header.
- Fix a race condition found by Stijn Devriendt in pin_request()
ChangeLog v8->v9:
- Drop the bus_type and the sysfs attributes and all, we're not on
the clear about how this should be used for e.g. userspace
interfaces so let us save this for the future.
- Use the right name in MAINTAINERS, PIN CONTROL rather than
PINMUX
- Don't kfree() the device state holder, let the .remove() callback
handle this.
- Fix up numerous kerneldoc headers to have one line for the function
description and more verbose documentation below the parameters
ChangeLog v9->v10:
- pinctrl: EXPORT_SYMBOL needs export.h, folded in a patch
from Steven Rothwell
- fix pinctrl_register error handling, folded in a patch from
Axel Lin
- Various fixes to documentation text so that it's consistent.
- Removed pointless comment from drivers/Kconfig
- Removed dependency on SYSFS since we removed the bus in
v9.
- Renamed hopelessly abbreviated pctldev_* functions to the
more verbose pinctrl_dev_*
- Drop mutex properly when looking up GPIO ranges
- Return NULL instead of ERR_PTR() errors on registration of
pin controllers, using cast pointers is fragile. We can
live without the detailed error codes for sure.
Cc: Stijn Devriendt <highguy@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This patch exposes the tos value for the TCP sockets when the TOS flag
is requested in the ext_flags for the inet_diag request. This would mainly be
used to expose TOS values for both for TCP and UDP sockets. Currently it is
supported for TCP. When netlink support for UDP would be added the support
to expose the TOS values would alse be done. For IPV4 tos value is exposed
and for IPV6 tclass value is exposed.
Signed-off-by: Murali Raja <muralira@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_vs_mutext is used by both netns shutdown code and startup
and both implicit uses sk_lock-AF_INET mutex.
cleanup CPU-1 startup CPU-2
ip_vs_dst_event() ip_vs_genl_set_cmd()
sk_lock-AF_INET __ip_vs_mutex
sk_lock-AF_INET
__ip_vs_mutex
* DEAD LOCK *
A new mutex placed in ip_vs netns struct called sync_mutex is added.
Comments from Julian and Simon added.
This patch has been running for more than 3 month now and it seems to work.
Ver. 3
IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex
instead of __ip_vs_mutex as sugested by Julian.
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We can now move the radiotap header parsing into
ieee80211_monitor_start_xmit(). This moves it out of
the hotpath, and also helps the code since now the
radiotap header will no longer be present in
ieee80211_xmit() etc. which is easier to understand.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The purpose of this is two-fold:
1) by moving it out of tx_data.flags, we can in
another patch move the radiotap parsing so it
no longer is in the hotpath
2) if a device implements fragmentation but can
optionally skip it, the radiotap request for
not doing fragmentation may be honoured
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is only used/needed by the vmbus core code, so move it out of the
hyperv.h file and into the .c file that uses it.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This function is only used in the file it is declared in
(channel_mgmt.c) so make it static and remove it from the hyperv.h file.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This file was created by mushing different .h files together and it
shows. This change removes some unneeded forward declarations.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I have no idea what these were ever for, but they aren't used, so delete
them.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
They aren't used anywhere anymore now that the debugging macros are
gone, so remove it from hyperv.h as well.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As there is no user of this variable, it's time to delete it. For
dynamic debugging of the hyperv code, use the standard dynamic debug
kernel interface.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
These aren't used by anyone anymore, so remove them before someone tries
to use them again.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It's a global symbol, so properly prefix it and use the proper EXPORT
value as well.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
No one outside of the hyperv core needs to include the asm/hyperv.h
file, so don't put it in the "global" include/linux/hyperv.h file.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
role_switch variable inside l2cap_chan is a logical one and can
be easily converted to flag
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
force_active variable inside l2cap_chan is a logical one and can
be easily converted to flag
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
force_reliable variable inside l2cap_chan is a logical one and can
be easily converted to flag
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
flushable variable inside l2cap_chan is a logical one and can
be easily converted to flag. Added flags in l2cap_chan structure.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Commit 1230db8e15 ("llist: Make some llist functions inline")
has deleted the definitions, causing problems for (not upstream yet)
code that tries to make use of them.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: David Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20111005172528.0d0a8afc65acef7ace22a24e@canb.auug.org.au
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After many years wandering the desert, it is finally time for the
Microsoft HyperV code to move out of the staging directory. Or at least
the core hyperv bus code, and the utility driver, the rest still have
some review to get through by the various subsystem maintainers.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
The Kirkwood gave an unaligned memory access error on
line 742 of drivers/usb/host/echi-hcd.c:
"ehci->last_periodic_enable = ktime_get_real();"
Signed-off-by: Harro Haan <hrhaan@gmail.com>
Cc: stable <stable@kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* pm-domains:
PM / Domains: Split device PM domain data into base and need_restore
ARM: mach-shmobile: sh7372 sleep warning fixes
ARM: mach-shmobile: sh7372 A3SM support
ARM: mach-shmobile: sh7372 generic suspend/resume support
PM / Domains: Preliminary support for devices with power.irq_safe set
PM: Move clock-related definitions and headers to separate file
PM / Domains: Use power.sybsys_data to reduce overhead
PM: Reference counting of power.subsys_data
PM: Introduce struct pm_subsys_data
ARM / shmobile: Make A3RV be a subdomain of A4LC on SH7372
PM / Domains: Rename argument of pm_genpd_add_subdomain()
PM / Domains: Rename GPD_STATE_WAIT_PARENT to GPD_STATE_WAIT_MASTER
PM / Domains: Allow generic PM domains to have multiple masters
PM / Domains: Add "wait for parent" status for generic PM domains
PM / Domains: Make pm_genpd_poweron() always survive parent removal
PM / Domains: Do not take parent locks to modify subdomain counters
PM / Domains: Implement subdomain counters as atomic fields
* pm-runtime:
PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set
PM / Runtime: Replace dev_dbg() with trace_rpm_*()
PM / Runtime: Introduce trace points for tracing rpm_* functions
PM / Runtime: Don't run callbacks under lock for power.irq_safe set
USB: Add wakeup info to debugging messages
PM / Runtime: pm_runtime_idle() can be called in atomic context
PM / Runtime: Add macro to test for runtime PM events
PM / Runtime: Add might_sleep() to runtime PM functions
To avoid ifdefs in the other code that supports DCB notifiers
add stub routines. This method seems popular in other net code
for example 8021Q.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add DCBX mode to event notifiers so listeners can learn
currently enabled mode.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ifindex instead of ifname in the DCB app ring. This makes for a smaller
data structure and faster comparisons. It also avoids possible issues when
a net device is renamed.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The two new attributes exclude_guest and exclude_host can
bes used by user-space to tell the kernel to setup
performance counter to either only count while the CPU is in
guest or in host mode.
An additional check is also introduced to make sure
user-space does not try to exclude guest and host mode from
counting.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317816084-18026-2-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The clocksource name should be const for correctness.
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
To read the current PM QoS value for a given device we need to
make sure that the device's power.constraints object won't be
removed while we're doing that. For this reason, put the
operation under dev->power.lock and acquire the lock
around the initialization and removal of power.constraints.
Moreover, since we're using the value of power.constraints to
determine whether or not the object is present, the
power.constraints_state field isn't necessary any more and may be
removed. However, dev_pm_qos_add_request() needs to check if the
device is being removed from the system before allocating a new
PM QoS constraints object for it, so make it use the
power.power_state field of struct device for this purpose.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* git://github.com/davem330/net:
pch_gbe: Fixed the issue on which a network freezes
pch_gbe: Fixed the issue on which PC was frozen when link was downed.
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
net: xen-netback: correctly restart Tx after a VM restore/migrate
bonding: properly stop queuing work when requested
can bcm: fix incomplete tx_setup fix
RDSRDMA: Fix cleanup of rds_iw_mr_pool
net: Documentation: Fix type of variables
ibmveth: Fix oops on request_irq failure
ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
cxgb4: Fix EEH on IBM P7IOC
can bcm: fix tx_setup off-by-one errors
MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
dp83640: reduce driver noise
ptp: fix L2 event message recognition
Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults. Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.
Also, add the option for peer to peer DMA MPS configuration. Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another. To work around this, simply make the system
wide MPS the smallest possible value (128B).
Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Initial benchmarks show they're a net loss:
$ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
$ echo 4096 32000 64 128 > /proc/sys/kernel/sem
$ ./sembench -t 2048 -w 1900 -o 0
Pre:
run time 30 seconds 778936 worker burns per second
run time 30 seconds 912190 worker burns per second
run time 30 seconds 817506 worker burns per second
run time 30 seconds 830870 worker burns per second
run time 30 seconds 845056 worker burns per second
Post:
run time 30 seconds 905920 worker burns per second
run time 30 seconds 849046 worker burns per second
run time 30 seconds 886286 worker burns per second
run time 30 seconds 822320 worker burns per second
run time 30 seconds 900283 worker burns per second
So about 4% faster. (!)
cpu_relax() stalls the pipeline, therefore, when used in a tight loop
it has the following benefits:
- allows SMT siblings to have a go;
- reduces pressure on the CPU interconnect.
However, cmpxchg loops are unfair and thus have unbounded completion
time, therefore we should avoid getting in such heavily contended
situations where the above benefits make any difference.
A typical cmpxchg loop should not go round more than a handfull of
times at worst, therefore adding extra delays just slows things down.
Since the llist primitives are new, there aren't any bad users yet,
and we should avoid growing them. Heavily contended sites should
generally be better off using the ticket locks for serialization since
they provide bounded completion times (fifo-fair over the cpus).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1315836358.26517.43.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use the generic llist primitives.
We had a private lockless list implementation in the scheduler in the wake-list
code, now that we have a generic llist implementation that provides all required
operations, switch to it.
This patch is not expected to change any behavior.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1315836353.26517.42.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
So we don't have to expose the struct list_node member.
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315836348.26517.41.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use llist in irq_work instead of the lock-less linked list
implementation in irq_work to avoid the code duplication.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-6-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Extend the llist_add*() functions to return a success indicator, this
allows us in the scheduler code to send an IPI if the queue was empty.
( There's no effect on existing users, because the list_add_xxx() functions
are inline, thus this will be optimized out by the compiler if not used
by callers. )
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-5-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If in llist_add()/etc. functions the first cmpxchg() call succeeds, it is
not necessary to use cpu_relax() before the cmpxchg(). So cpu_relax() in
a busy loop involving cmpxchg() should go after cmpxchg() instead of before
that.
This patch fixes this for all involved llist functions.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-4-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove the nmi() checks spread around the code. in_nmi() is not available
on every architecture and it's a pretty obscure and ugly check in any case.
Cc: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-3-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the pNFS obj-LD the device table at the layout level needs
to point to a device_cache node, where it is possible and likely
that many layouts will point to the same device-nodes.
In Exofs we have a more orderly structure where we have a single
array of devices that repeats twice for a round-robin view of the
device table
This patch moves to a model that can be used by the pNFS obj-LD
where struct ore_components holds an array of ore_dev-pointers.
(ore_dev is newly defined and contains a struct osd_dev *od
member)
Each pointer in the array of pointers will point to a bigger
user-defined dev_struct. That can be accessed by use of the
container_of macro.
In Exofs an __alloc_dev_table() function allocates the
ore_dev-pointers array as well as an exofs_dev array, in one
allocation and does the addresses dance to set everything pointing
correctly. It still keeps the double allocation trick for the
inodes round-robin view of the table.
The device table is always allocated dynamically, also for the
single device case. So it is unconditionally freed at umount.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Because llist code will be used in performance critical scheduler
code path, make llist_add() and llist_del_all() inline to avoid
function calling overhead and related 'glue' overhead.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1315461646-1379-2-git-send-email-ying.huang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Each event adds some points to its counters. By default it adds 1,
and a number of points may be transmited in event's parameters.
E.g. sched:sched_stat_runtime adds how long process has been running.
But this functionality was broken by v2.6.31-rc5-392-gf413cdb
and now the event's parameters doesn't affect on a number of points.
TP_perf_assign isn't defined, so __perf_count(c) isn't executed and
__count is always equal to 1.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317052535-1765247-2-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
tx params should be configured per interface.
add ieee80211_vif param to the conf_tx callback,
and change all the drivers that use this callback.
The following spatch was used:
@rule1@
struct ieee80211_ops ops;
identifier conf_tx_op;
@@
ops.conf_tx = conf_tx_op;
@rule2@
identifier rule1.conf_tx_op;
identifier hw, queue, params;
@@
conf_tx_op (
- struct ieee80211_hw *hw,
+ struct ieee80211_hw *hw, struct ieee80211_vif *vif,
u16 queue,
const struct ieee80211_tx_queue_params *params) {...}
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Recently mac80211 was changed to use nullfunc instead of probe
request for connection monitoring for tx ack status reporting
hardwares. Sometimes in congested network, STA got disconnected
quickly after the association. It was observered that the rate
control was not adopted to environment due to minimal transmission.
As the nullfunc are used for monitoring purpose, these frames should
not be sacrificed for rate control updation. So it is better to send
the monitoring null func frames at minimum rate that could help to
retain the connection.
Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a gpio setup function which gives a chance to set up
platform specific configuration such as pin multiplexing,
input/output direction at the runtime or booting time.
Signed-off-by: Sangwook Lee <sangwook.lee@linaro.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Allow injected unicast frames to be sent without having to wait
for an ACK.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
removing obsoleted sysctl,
ip_rt_gc_interval variable no longer used since 2.6.38
Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This repairs problem with compile library in userspace (libnl).
Signed-off-by: Jiří Župka <jzupka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows ss command (iproute2) to display "ecnseen" if at least one packet
with ECT(0) or ECT(1) or ECN was received by this socket.
"ecn" means ECN was negotiated at session establishment (TCP level)
"ecnseen" means we received at least one packet with ECT fields set (IP
level)
ss -i
...
ESTAB 0 0 192.168.20.110:22 192.168.20.144:38016
ino:5950 sk:f178e400
mem:(r0,w0,f0,t0) ts sack ecn ecnseen bic wscale:7,8 rto:210
rtt:12.5/7.5 cwnd:10 send 9.3Mbps rcv_space:14480
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct ore_striping_info will be used later in other
structures. And ore_calc_stripe_info as well. Rename them
make struct ore_striping_info public. ore_calc_stripe_info
is still static, will be made public on first use.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
The struct pnfs_osd_data_map data_map member of exofs_sb_info was
never used after mount. In fact all it's members were duplicated
by the ore_layout structure. So just remove the duplicated information.
Also removed some stupid, but perfectly supported, restrictions on
layout parameters. The case where num_devices is not divisible by
mirror_count+1 is perfectly fine since the rotating device view
will eventually use all the devices it can get.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
ore_components already has a comps member so this leads
to things like comps->comps which is annoying. the name oc
was already used in new code. So rename all old usage of
ore_components comps => ore_components oc.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
As request_percpu_irq() doesn't allow for a percpu interrupt to have
its type configured (it is generally impossible to configure it on all
CPUs at once), add a 'type' argument to enable_percpu_irq().
This allows some low-level, board specific init code to be switched to
a generic API.
[ tglx: Added WARN_ON argument ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The ARM GIC interrupt controller offers per CPU interrupts (PPIs),
which are usually used to connect local timers to each core. Each CPU
has its own private interface to the GIC, and only sees the PPIs that
are directly connect to it.
While these timers are separate devices and have a separate interrupt
line to a core, they all use the same IRQ number.
For these devices, request_irq() is not the right API as it assumes
that an IRQ number is visible by a number of CPUs (through the
affinity setting), but makes it very awkward to express that an IRQ
number can be handled by all CPUs, and yet be a different interrupt
line on each CPU, requiring a different dev_id cookie to be passed
back to the handler.
The *_percpu_irq() functions is designed to overcome these
limitations, by providing a per-cpu dev_id vector:
int request_percpu_irq(unsigned int irq, irq_handler_t handler,
const char *devname, void __percpu *percpu_dev_id);
void free_percpu_irq(unsigned int, void __percpu *);
int setup_percpu_irq(unsigned int irq, struct irqaction *new);
void remove_percpu_irq(unsigned int irq, struct irqaction *act);
void enable_percpu_irq(unsigned int irq);
void disable_percpu_irq(unsigned int irq);
The API has a number of limitations:
- no interrupt sharing
- no threading
- common handler across all the CPUs
Once the interrupt is requested using setup_percpu_irq() or
request_percpu_irq(), it must be enabled by each core that wishes its
local interrupt to be delivered.
Based on an initial patch by Thomas Gleixner.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Four cpufreq-like governors are provided as examples.
powersave: use the lowest frequency possible. The user (device) should
set the polling_ms as 0 because polling is useless for this governor.
performance: use the highest freqeuncy possible. The user (device)
should set the polling_ms as 0 because polling is useless for this
governor.
userspace: use the user specified frequency stored at
devfreq.user_set_freq. With sysfs support in the following patch, a user
may set the value with the sysfs interface.
simple_ondemand: simplified version of cpufreq's ondemand governor.
When a user updates OPP entries (enable/disable/add), OPP framework
automatically notifies devfreq to update operating frequency
accordingly. Thus, devfreq users (device drivers) do not need to update
devfreq manually with OPP entry updates or set polling_ms for powersave
, performance, userspace, or any other "static" governors.
Note that these are given only as basic examples for governors and any
devices with devfreq may implement their own governors with the drivers
and use them.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mike Turquette <mturquette@ti.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
With OPPs, a device may have multiple operable frequency and voltage
sets. However, there can be multiple possible operable sets and a system
will need to choose one from them. In order to reduce the power
consumption (by reducing frequency and voltage) without affecting the
performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
scheme may be used.
This patch introduces the DVFS capability to non-CPU devices with OPPs.
DVFS is a techique whereby the frequency and supplied voltage of a
device is adjusted on-the-fly. DVFS usually sets the frequency as low
as possible with given conditions (such as QoS assurance) and adjusts
voltage according to the chosen frequency in order to reduce power
consumption and heat dissipation.
The generic DVFS for devices, devfreq, may appear quite similar with
/drivers/cpufreq. However, cpufreq does not allow to have multiple
devices registered and is not suitable to have multiple heterogenous
devices with different (but simple) governors.
Normally, DVFS mechanism controls frequency based on the demand for
the device, and then, chooses voltage based on the chosen frequency.
devfreq also controls the frequency based on the governor's frequency
recommendation and let OPP pick up the pair of frequency and voltage
based on the recommended frequency. Then, the chosen OPP is passed to
device driver's "target" callback.
When PM QoS is going to be used with the devfreq device, the device
driver should enable OPPs that are appropriate with the current PM QoS
requests. In order to do so, the device driver may call opp_enable and
opp_disable at the notifier callback of PM QoS so that PM QoS's
update_target() call enables the appropriate OPPs. Note that at least
one of OPPs should be enabled at any time; be careful when there is a
transition.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mike Turquette <mturquette@ti.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
irq: Fix check for already initialized irq_domain in irq_domain_add
irq: Add declaration of irq_domain_simple_ops to irqdomain.h
* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
x86/rtc: Don't recursively acquire rtc_lock
* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
posix-cpu-timers: Cure SMP wobbles
sched: Fix up wchan borkage
sched/rt: Migrate equal priority tasks to available CPUs
The patch enables to register notifier_block for an OPP-device in order
to get notified for any changes in the availability of OPPs of the
device. For example, if a new OPP is inserted or enable/disable status
of an OPP is changed, the notifier is executed.
This enables the usage of opp_add, opp_enable, and opp_disable to
directly take effect with any connected entities such as cpufreq or
devfreq.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Mike Turquette <mturquette@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
With the addition of uAPSD and driver buffering
the powersave handling has gotten quite complex.
Add a section to the documentation to explain it
for anyone wanting to implement it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
iwlwifi has a separate EOSP notification from
the device, and to make use of that properly
it needs to be passed to mac80211. To be able
to mix with tx_status_irqsafe and rx_irqsafe
it also needs to be an "_irqsafe" version in
the sense that it goes through the tasklet,
the actual flag clearing would be IRQ-safe
but doing it directly would cause reordering
issues.
This is needed in the case of a P2P GO going
into an absence period without transmitting
any frames that should be driver-released as
in this case there's no other way to inform
mac80211 that the service period ended. Note
that for drivers that don't use the _irqsafe
functions another version of this function
will be required.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
iwlwifi needs to know the number of frames that are
going to be sent to a station while it is asleep so
it can properly handle the uCode blocking of that
station.
Before uAPSD, we got by by telling the device that
a single frame was going to be released whenever we
encountered IEEE80211_TX_CTL_POLL_RESPONSE. With
uAPSD, however, that is no longer possible since
there could be more than a single frame.
To support this model, add a new callback to notify
drivers when frames are going to be released.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If a PS-poll frame is retried (but was received)
there is no way to detect that since it has no
sequence number. As a consequence, the standard
asks us to not react to PS-poll frames until the
response to one made it out (was ACKed or lost).
Implement this by using the WLAN_STA_SP flags to
also indicate a PS-Poll "service period" and the
IEEE80211_TX_STATUS_EOSP flag for the response
packet to indicate the end of the "SP" as usual.
We could use separate flags, but that will most
likely completely confuse drivers, and while the
standard doesn't exclude simultaneously polling
using uAPSD and PS-Poll, doing that seems quite
problematic.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add uAPSD support to mac80211. This is probably not
possible with all devices, so advertising it with
the cfg80211 flag will be left up to drivers that
want it.
Due to my previous patches it is now a fairly
straight-forward extension. Drivers need to have
accurate TX status reporting for the EOSP frame.
For drivers that buffer themselves, the provided
APIs allow releasing the right number of frames,
but then drivers need to set EOSP and more-data
themselves. This is documented in more detail in
the new code itself.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If there are frames for a station buffered in
the driver, mac80211 announces those in the TIM
IE but there's no way to release them. Add new
API to release such frames and use it when the
station polls for a frame.
Since the API will soon also be used for uAPSD
it is easily extensible.
Note that before this change drivers announcing
driver-buffered frames in the TIM bit actually
will respond to a PS-Poll with a potentially
lower priority frame (if there are any frames
buffered in mac80211), after this patch a driver
that hasn't been changed will no longer respond
at all. This only affects ath9k, which will need
to be fixed to implement the new API.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For uAPSD support we'll need to have per-AC PS
buffers. As this is a major undertaking, split
the buffers before really adding support for
uAPSD. This already makes some reference to the
uapsd_queues variable, but for now that will
never be non-zero.
Since book-keeping is complicated, also change
the logic for keeping a maximum of frames only
and allow 64 frames per AC (up from 128 for a
station).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For uAPSD implementation, it is necessary to know on
which ACs frames are buffered. mac80211 obviously
knows about the frames it has buffered itself, but
with aggregation many drivers buffer frames. Thus,
mac80211 needs to be informed about this.
For now, since we don't have APSD in any form, this
will unconditionally set the TIM bit for the station
but later with uAPSD only some ACs might cause the
TIM bit to be set.
ath9k is the only driver using this API and I only
modify it in the most basic way, it won't be able
to implement uAPSD with this yet. But it can't do
that anyway since there's no way to selectively
release frames to the peer yet.
Since drivers will buffer frames per TID, let them
inform mac80211 on a per TID basis, mac80211 will
then sort out the AC mapping itself.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When adding a TDLS peer STA, mark it with a new flag in both nl80211 and
mac80211. Before adding a peer, make sure the wiphy supports TDLS and
our operating mode is appropriate (managed).
In addition, make sure all peers are removed on disassociation.
A TDLS peer is first added just before link setup is initiated. In later
setup stages we have more info about peer supported rates, capabilities,
etc. This info is reported via nl80211_set_station().
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Cc: Kalyan C Gaddam <chakkal@iit.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Register and implement the TDLS cfg80211 callback functions.
Internally prepare and send TDLS management frames. We incorporate
local STA capabilities and supported rates with extra IEs given by
usermode. The resulting packet is either encapsulated in a data frame,
or assembled as an action frame. It is transmitted either directly or
through the AP, as mandated by the TDLS specification.
Declare support for the TDLS external setup wiphy capability. This
tells usermode to handle link setup and discovery on its own, and use the
kernel driver for sending TDLS mgmt packets.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Cc: Kalyan C Gaddam <chakkal@iit.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Relocate the mesh implementation of adding the (extended) supported
rates IE to util.c, anticipating its use by other parts of mac80211.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Cc: Kalyan C Gaddam <chakkal@iit.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add support for sending high-level TDLS commands and TDLS frames via
NL80211_CMD_TDLS_OPER and NL80211_CMD_TDLS_MGMT, respectively. Add
appropriate cfg80211 callbacks for lower level drivers.
Add wiphy capability flags for TDLS support and advertise them via
nl80211.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Cc: Kalyan C Gaddam <chakkal@iit.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently, when hostapd sets the station as authorized
we also overwrite its uAPSD parameter. This obviously
leads to buggy behaviour (later, with my patches that
actually add uAPSD support). To fix this, only apply
those parameters if they were actually set in nl80211,
and to achieve that add a bitmap of things to apply.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ensure we've got a function so users can enable/disable the
cache bypass option.
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
David reported:
Attached below is a watered-down version of rt/tst-cpuclock2.c from
GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or
similar.
Run it several times, and you will see cases where the main thread
will measure a process clock difference before and after the nanosleep
which is smaller than the cpu-burner thread's individual thread clock
difference. This doesn't make any sense since the cpu-burner thread
is part of the top-level process's thread group.
I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
64-bit binaries).
For example:
[davem@boricha build-x86_64-linux]$ ./test
process: before(0.001221967) after(0.498624371) diff(497402404)
thread: before(0.000081692) after(0.498316431) diff(498234739)
self: before(0.001223521) after(0.001240219) diff(16698)
[davem@boricha build-x86_64-linux]$
The diff of 'process' should always be >= the diff of 'thread'.
I make sure to wrap the 'thread' clock measurements the most tightly
around the nanosleep() call, and that the 'process' clock measurements
are the outer-most ones.
---
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <pthread.h>
static pthread_barrier_t barrier;
static void *chew_cpu(void *arg)
{
pthread_barrier_wait(&barrier);
while (1)
__asm__ __volatile__("" : : : "memory");
return NULL;
}
int main(void)
{
clockid_t process_clock, my_thread_clock, th_clock;
struct timespec process_before, process_after;
struct timespec me_before, me_after;
struct timespec th_before, th_after;
struct timespec sleeptime;
unsigned long diff;
pthread_t th;
int err;
err = clock_getcpuclockid(0, &process_clock);
if (err)
return 1;
err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
if (err)
return 1;
pthread_barrier_init(&barrier, NULL, 2);
err = pthread_create(&th, NULL, chew_cpu, NULL);
if (err)
return 1;
err = pthread_getcpuclockid(th, &th_clock);
if (err)
return 1;
pthread_barrier_wait(&barrier);
err = clock_gettime(process_clock, &process_before);
if (err)
return 1;
err = clock_gettime(my_thread_clock, &me_before);
if (err)
return 1;
err = clock_gettime(th_clock, &th_before);
if (err)
return 1;
sleeptime.tv_sec = 0;
sleeptime.tv_nsec = 500000000;
nanosleep(&sleeptime, NULL);
err = clock_gettime(th_clock, &th_after);
if (err)
return 1;
err = clock_gettime(my_thread_clock, &me_after);
if (err)
return 1;
err = clock_gettime(process_clock, &process_after);
if (err)
return 1;
diff = process_after.tv_nsec - process_before.tv_nsec;
printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
process_before.tv_sec, process_before.tv_nsec,
process_after.tv_sec, process_after.tv_nsec, diff);
diff = th_after.tv_nsec - th_before.tv_nsec;
printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
th_before.tv_sec, th_before.tv_nsec,
th_after.tv_sec, th_after.tv_nsec, diff);
diff = me_after.tv_nsec - me_before.tv_nsec;
printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
me_before.tv_sec, me_before.tv_nsec,
me_after.tv_sec, me_after.tv_nsec, diff);
return 0;
}
This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.
This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().
Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add to the dev_state and alloc_async structures the user namespace
corresponding to the uid and euid. Pass these to kill_pid_info_as_uid(),
which can then implement a proper, user-namespace-aware uid check.
Changelog:
Sep 20: Per Oleg's suggestion: Instead of caching and passing user namespace,
uid, and euid each separately, pass a struct cred.
Sep 26: Address Alan Stern's comments: don't define a struct cred at
usbdev_open(), and take and put a cred at async_completed() to
ensure it lasts for the duration of kill_pid_info_as_cred().
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Allow the xen balloon driver to populate its list of extra pages from
more than one region of memory. This will allow platforms to provide
(for example) a region of low memory and a region of high memory.
The maximum possible number of extra regions is 128 (== E820MAX) which
is quite large so xen_extra_mem is placed in __initdata. This is safe
as both xen_memory_setup() and balloon_init() are in __init.
The balloon regions themselves are not altered (i.e., there is still
only the one region).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In xen_memory_setup() pages that occur in gaps in the memory map are
released back to Xen. This reduces the domain's current page count in
the hypervisor. The Xen balloon driver does not correctly decrease
its initial current_pages count to reflect this. If 'delta' pages are
released and the target is adjusted the resulting reservation is
always 'delta' less than the requested target.
This affects dom0 if the initial allocation of pages overlaps the PCI
memory region but won't affect most domU guests that have been setup
with pseudo-physical memory maps that don't have gaps.
Fix this by accouting for the released pages when starting the balloon
driver.
If the domain's targets are managed by xapi, the domain may eventually
run out of memory and die because xapi currently gets its target
calculations wrong and whenever it is restarted it always reduces the
target by 'delta'.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If we want to use granted pages for AIO, changing the mappings of a user
vma and the corresponding p2m is not enough, we also need to update the
kernel mappings accordingly.
Currently this is only needed for pages that are created for user usages
through /dev/xen/gntdev. As in, pages that have been in use by the
kernel and use the P2M will not need this special mapping.
However there are no guarantees that in the future the kernel won't
start accessing pages through the 1:1 even for internal usage.
In order to avoid the complexity of dealing with highmem, we allocated
the pages lowmem.
We issue a HYPERVISOR_grant_table_op right away in
m2p_add_override and we remove the mappings using another
HYPERVISOR_grant_table_op in m2p_remove_override.
Considering that m2p_add_override and m2p_remove_override are called
once per page we use multicalls and hypercall batching.
Use the kmap_op pointer directly as argument to do the mapping as it is
guaranteed to be present up until the unmapping is done.
Before issuing any unmapping multicalls, we need to make sure that the
mapping has already being done, because we need the kmap->handle to be
set correctly.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Removed GRANT_FRAME_BIT usage]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add an highmem parameter to alloc_xenballooned_pages, to allow callers to
request lowmem or highmem pages.
Fix the code style of free_xenballooned_pages' prototype.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Commit 7765be (Fix RCU_BOOST race handling current->rcu_read_unlock_special)
introduced a new ->rcu_boosted field in the task structure. This is
redundant because the existing ->rcu_boost_mutex will be non-NULL at
any time that ->rcu_boosted is nonzero. Therefore, this commit removes
->rcu_boosted and tests ->rcu_boost_mutex instead.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
We only need to constrain the compiler if we are actually exiting
the top-level RCU read-side critical section. This commit therefore
moves the first barrier() cal in __rcu_read_unlock() to inside the
"if" statement, thus avoiding needless register flushes for inner
rcu_read_unlock() calls.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The differences between rcu_assign_pointer() and RCU_INIT_POINTER() are
subtle, and it is easy to use the the cheaper RCU_INIT_POINTER() when
the more-expensive rcu_assign_pointer() should have been used instead.
The consequences of this mistake are quite severe.
This commit therefore carefully lays out the situations in which it it
permissible to use RCU_INIT_POINTER().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Recent changes to gcc give warning messages on rcu_assign_pointers()'s
checks that allow it to determine when it is OK to omit the memory
barrier. Stephen Hemminger tried a number of gcc tricks to silence
this warning, but #pragmas and CPP macros do not work together in the
way that would be required to make this work.
However, we now have RCU_INIT_POINTER(), which already omits this
memory barrier, and which therefore may be used when assigning NULL to
an RCU-protected pointer that is accessible to readers. This commit
therefore makes rcu_assign_pointer() unconditionally emit the memory
barrier.
Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
RCU no longer uses this global variable, nor does anyone else. This
commit therefore removes this variable. This reduces memory footprint
and also removes some atomic instructions and memory barriers from
the dyntick-idle path.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_dereference_bh_protected() and rcu_dereference_sched_protected()
macros are synonyms for rcu_dereference_protected() and are not used
anywhere in mainline. This commit therefore removes them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Add trace events to record grace-period start and end, quiescent states,
CPUs noticing grace-period start and end, grace-period initialization,
call_rcu() invocation, tasks blocking in RCU read-side critical sections,
tasks exiting those same critical sections, force_quiescent_state()
detection of dyntick-idle and offline CPUs, CPUs entering and leaving
dyntick-idle mode (except from NMIs), CPUs coming online and going
offline, and CPUs being kicked for staying in dyntick-idle mode for too
long (as in many weeks, even on 32-bit systems).
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
rcu: Add the rcu flavor to callback trace events
The earlier trace events for registering RCU callbacks and for invoking
them did not include the RCU flavor (rcu_bh, rcu_preempt, or rcu_sched).
This commit adds the RCU flavor to those trace events.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This patch #ifdefs TINY_RCU kthreads out of the kernel unless RCU_BOOST=y,
thus eliminating context-switch overhead if RCU priority boosting has
not been configured.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Add event-trace markers to TREE_RCU kthreads to allow including these
kthread's CPU time in the utilization calculations.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Add a string to the rcu_batch_start() and rcu_batch_end() trace
messages that indicates the RCU type ("rcu_sched", "rcu_bh", or
"rcu_preempt"). The trace messages for the actual invocations
themselves are not marked, as it should be clear from the
rcu_batch_start() and rcu_batch_end() events before and after.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds the trace_rcu_utilization() marker that is to be
used to allow postprocessing scripts compute RCU's CPU utilization,
give or take event-trace overhead. Note that we do not include RCU's
dyntick-idle interface because event tracing requires RCU protection,
which is not available in dyntick-idle mode.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There was recently some controversy about the overhead of invoking RCU
callbacks. Add TRACE_EVENT()s to obtain fine-grained timings for the
start and stop of a batch of callbacks and also for each callback invoked.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull the code that waits for an RCU grace period into a single function,
which is then called by synchronize_rcu() and friends in the case of
TREE_RCU and TREE_PREEMPT_RCU, and from rcu_barrier() and friends in
the case of TINY_RCU and TINY_PREEMPT_RCU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Take a first step towards untangling Linux kernel header files by
placing the struct rcu_head definition into include/linux/types.h
and including include/linux/types.h in include/linux/rcupdate.h
where struct rcu_head used to be defined. The actual inclusion point
for include/linux/types.h is with the rest of the #include directives
rather than at the point where struct rcu_head used to be defined,
as suggested by Mathieu Desnoyers.
Once this is in place, then header files that need only rcu_head
can include types.h rather than rcupdate.h.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Long ago, using TREE_RCU with PREEMPT would result in "scheduling
while atomic" diagnostics if you blocked in an RCU read-side critical
section. However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats
this diagnostic. This commit therefore adds a replacement diagnostic
based on PROVE_RCU.
Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being
used for things that have nothing to do with rcu_dereference(), rename
lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third
argument that is a string indicating what is suspicious. This third
argument is passed in from a new third argument to rcu_lockdep_assert().
Update all calls to rcu_lockdep_assert() to add an informative third
argument.
Also, add a pair of rcu_lockdep_assert() calls from within
rcu_note_context_switch(), one complaining if a context switch occurs
in an RCU-bh read-side critical section and another complaining if a
context switch occurs in an RCU-sched read-side critical section.
These are present only if the PROVE_RCU kernel parameter is enabled.
Finally, fix some checkpatch whitespace complaints in lockdep.c.
Again, you must enable PROVE_RCU to see these new diagnostics. But you
are enabling PROVE_RCU to check out new RCU uses in any case, aren't you?
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The IEEE 1588 standard defines two kinds of messages, event and general
messages. Event messages require time stamping, and general do not. When
using UDP transport, two separate ports are used for the two message
types.
The BPF designed to recognize event messages incorrectly classifies L2
general messages as event messages. This commit fixes the issue by
extending the filter to check the message type field for L2 PTP packets.
Event messages are be distinguished from general messages by testing
the "general" bit.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an event to monitor comm value changes of tasks. Such an event
becomes vital, if someone desires to control threads of a process in
different manner.
A natural characteristic of threads is its comm value, and helpfully
application developers have an opportunity to change it in runtime.
Reporting about such events via proc connector allows to fine-grain
monitoring and control potentials, for instance a process control daemon
listening to proc connector and following comm value policies can place
specific threads to assigned cgroup partitions.
It might be possible to achieve a pale partial one-shot likeness without
this update, if an application changes comm value of a thread generator
task beforehand, then a new thread is cloned, and after that proc
connector listener gets the fork event and reads new thread's comm value
from procfs stat file, but this change visibly simplifies and extends the
matter.
Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 7361c36c52 (af_unix: Allow credentials to work across
user and pid namespaces) af_unix performance dropped a lot.
This is because we now take a reference on pid and cred in each write(),
and release them in read(), usually done from another process,
eventually from another cpu. This triggers false sharing.
# Events: 154K cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... .................. .........................
#
10.40% hackbench [kernel.kallsyms] [k] put_pid
8.60% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
7.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg
6.11% hackbench [kernel.kallsyms] [k] do_raw_spin_lock
4.95% hackbench [kernel.kallsyms] [k] unix_scm_to_skb
4.87% hackbench [kernel.kallsyms] [k] pid_nr_ns
4.34% hackbench [kernel.kallsyms] [k] cred_to_ucred
2.39% hackbench [kernel.kallsyms] [k] unix_destruct_scm
2.24% hackbench [kernel.kallsyms] [k] sub_preempt_count
1.75% hackbench [kernel.kallsyms] [k] fget_light
1.51% hackbench [kernel.kallsyms] [k]
__mutex_lock_interruptible_slowpath
1.42% hackbench [kernel.kallsyms] [k] sock_alloc_send_pskb
This patch includes SCM_CREDENTIALS information in a af_unix message/skb
only if requested by the sender, [man 7 unix for details how to include
ancillary data using sendmsg() system call]
Note: This might break buggy applications that expected SCM_CREDENTIAL
from an unaware write() system call, and receiver not using SO_PASSCRED
socket option.
If SOCK_PASSCRED is set on source or destination socket, we still
include credentials for mere write() syscalls.
Performance boost in hackbench : more than 50% gain on a 16 thread
machine (2 quad-core cpus, 2 threads per core)
hackbench 20 thread 2000
4.228 sec instead of 9.102 sec
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use sk_buff fragment capabilities to link together incoming skbs
instead of allocating a new skb for reassembly and copying.
The new reassembly code works equally well for ERTM and streaming
mode, so there is now one reassembly function instead of two.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This patch introduces 3 trace points to prepare for tracing
rpm_idle/rpm_suspend/rpm_resume functions, so we can use these
trace points to replace the current dev_dbg().
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Correct comment errors, that mistake cpu partial objects number as pages
number, may make reader misunderstand.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
tx params are currently configured per hw, although they
should be configured per interface.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Whenever the scan request or tx_mgmt is requesting not to
use CCK rate for managemet frames through
NL80211_ATTR_TX_NO_CCK_RATE attribute, then mac80211 should
select appropriate least non-CCK rate. This could help to
send P2P probes and P2P action frames at non 11b rates
without diabling 11b rates globally.
Cc: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a new nl80211 attribute to specify whether to send the management
frames in CCK rate or not. As of now the wpa_supplicant is disabling
CCK rate at P2P init itself. So this patch helps to send P2P probe
request/probe response/action frames being sent at non CCK rate in 2GHz
without disabling 11b rates.
This attribute is used with NL80211_CMD_TRIGGER_SCAN and
NL80211_CMD_FRAME commands to disable CCK rate for management frame
transmission.
Cc: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Protect 'cb' and 'cb_context' arguments in nci_data_exchange.
In fact, this implements a queue with max length of 1 data
exchange transactions in parallel.
Signed-off-by: Ilan Elias <ilane@ti.com>
Acked-by: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
TSF can be kept per vif.
Add ieee80211_vif param to set/get/reset_tsf, and move
the debugfs entries to the per-vif directory.
Update all the drivers that implement these callbacks.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Rename struct tcp_skb_cb "flags" to "tcp_flags" to ease code review and
maintenance.
Its content is a combination of FIN/SYN/RST/PSH/ACK/URG/ECE/CWR flags
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.
Fix these broken references, sometimes by dropping the irrelevant text
they were part of.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
That flag no longer makes sense, since we don't look up automount points
as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT
handling was buggy to begin with: it would avoid automounting even for
cases where we really *needed* to do the automount handling, and could
return ENOENT for autofs entries that hadn't been instantiated yet.
With our new non-eager automount semantics, one discussion has been
about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
newfstatat() and fstatat64() system calls), but it's probably not worth
it: you can always force at least directory automounting by simply
adding the final '/' to the filename, which works for *all* of the stat
family system calls, old and new.
So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
result of our bad default behavior.
Acked-by: Ian Kent <raven@themaw.net>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct tcp_skb_cb contains a "flags" field containing either tcp flags
or IP dsfield depending on context (input or output path)
Introduce ip_dsfield to make the difference clear and ease maintenance.
If later we want to save space, we can union flags/ip_dsfield
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While playing with a new ADSL box at home, I discovered that ECN
blackhole can trigger suboptimal quickack mode on linux : We send one
ACK for each incoming data frame, without any delay and eventual
piggyback.
This is because TCP_ECN_check_ce() considers that if no ECT is seen on a
segment, this is because this segment was a retransmit.
Refine this heuristic and apply it only if we seen ECT in a previous
segment, to detect ECN blackhole at IP level.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: Jerry Chu <hkchu@google.com>
CC: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
CC: Jim Gettys <jg@freedesktop.org>
CC: Dave Taht <dave.taht@gmail.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we've now turned around and made LOOKUP_FOLLOW *not* force an
automount, we want to add the ability to force an automount event on
lookup even if we don't happen to have one of the other flags that force
it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)
Most cases will never want to use this, since you'd normally want to
delay automounting as long as possible, which usually implies
LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
the automount any more).
But Trond argued sufficiently forcefully that at a minimum bind mounting
a file and quotactl will want to force the automount lookup. Some other
cases (like nfs_follow_remote_path()) could use it too, although
LOOKUP_DIRECTORY would work there as well.
This commit just adds the flag and logic, no users yet, though. It also
doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
was made irrelevant by the same change that made us not follow on
LOOKUP_FOLLOW.
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the device pass the USB2 software LPM and the host supports hardware
LPM, enable hardware LPM for the device to let the host decide when to
put the link into lower power state.
If hardware LPM is enabled for a port and driver wants to put it into
suspend, it must first disable hardware LPM, resume the port into U0,
and then suspend the port.
Signed-off-by: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Check device's LPM capability by examining the bmAttibutes field of the
USB2.0 Extension Descriptor.
Signed-off-by: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This commit gets BOS(Binary Device Object Store) descriptor set for Super
Speed devices and High Speed devices which support BOS descriptor.
BOS descriptor is used to report additional USB device-level capabilities
that are not reported via the Device descriptor. By getting BOS descriptor
set, driver can check device's device-level capability such as LPM
capability.
Signed-off-by: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The IEEE 1588 standard (PTP) has a provision for a "one step" mode, where
time stamps on outgoing event packets are inserted into the packet by the
hardware on the fly. This patch adds a new flag for the SIOCSHWTSTAMP
ioctl that lets user space programs request this mode.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct pm_domain_data data type is defined in such a way that
adding new fields specific to the generic PM domains code will
require include/linux/pm.h to be modified. As a result, data types
used only by the generic PM domains code will be defined in two
headers, although they all should be defined in pm_domain.h and
pm.h will need to include more headers, which won't be very nice.
For this reason change the definition of struct pm_subsys_data
so that its domain_data member is a pointer, which will allow
struct pm_domain_data to be subclassed by various PM domains
implementations. Remove the need_restore member from
struct pm_domain_data and make the generic PM domains code
subclass it by adding the need_restore member to the new data type.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Merge commit e8b364b88c
(PM / Clocks: Do not acquire a mutex under a spinlock) fixing
a regression in drivers/base/power/clock_ops.c.
Conflicts:
drivers/base/power/clock_ops.c
As mentioned by http://www.microsoft.com/whdc/device/input/DigitizerDrvs_touch.mspx
multitouch devices are those that have the input report HID_CONTACTID.
This patch detects this and unloads the generic-usb driver.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We had need to see the difference between scheduling a runnable task and
a runnable task being involuntarily preempted.
No app should rely on the old string output (the binary
trace event record format is not changed).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1316164603.10174.11.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If optional discard support in dm-crypt is enabled, discards requests
bypass the crypt queue and blocks of the underlying device are discarded.
For the read path, discarded blocks are handled the same as normal
ciphertext blocks, thus decrypted.
So if the underlying device announces discarded regions return zeroes,
dm-crypt must disable this flag because after decryption there is just
random noise instead of zeroes.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Device drivers that create and destroy SR-IOV virtual functions via
calls to pci_enable_sriov() and pci_disable_sriov can cause catastrophic
failures if they attempt to destroy VFs while they are assigned to
guest virtual machines. By adding a flag for use by the KVM module
to indicate that a device is assigned a device driver can check that
flag and avoid destroying VFs while they are assigned and avoid system
failures.
CC: Ian Campbell <ijc@hellion.org.uk>
CC: Konrad Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The EFR (Enhenced-Features-Register) is located at a different offset
than the other devices supporting UART_CAP_EFR. This change add a special
setup quick to set UPF_EXAR_EFR on the port. UPF_EXAR_EFR is then used to
the port type to PORT_XR17D15X since it is for sure a XR17D15X uart.
Signed-off-by: Søren Holm <sgh@sgh.dk>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add new xs_reset_watches function to shutdown watches from old kernel after
kexec boot. The old kernel does not unregister all watches in the
shutdown path. They are still active, the double registration can not
be detected by the new kernel. When the watches fire, unexpected events
will arrive and the xenwatch thread will crash (jumps to NULL). An
orderly reboot of a hvm guest will destroy the entire guest with all its
resources (including the watches) before it is rebuilt from scratch, so
the missing unregister is not an issue in that case.
With this change the xenstored is instructed to wipe all active watches
for the guest. However, a patch for xenstored is required so that it
accepts the XS_RESET_WATCHES request from a client (see changeset
23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
the registration of watches will fail and some features of a PVonHVM
guest are not available. The guest is still able to boot, but repeated
kexec boots will fail.
[v5: use xs_single instead of passing a dummy string to xs_talkv]
[v4: ignore -EEXIST in xs_reset_watches]
[v3: use XS_RESET_WATCHES instead of XS_INTRODUCE]
[v2: move all code which deals with XS_INTRODUCE into xs_introduce()
(based on feedback from Ian Campbell); remove casts from kvec assignment]
Signed-off-by: Olaf Hering <olaf@aepfle.de>
[v1: Redid the git description a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Update include/xen/interface/io/xs_wire.h from xen-unstable.
Now entries in xsd_sockmsg_type were added.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Now that the hypercall interface changes are in -unstable, make the
kernel side code not ignore the segment (aka domain) number anymore
(which results in pretty odd behavior on such systems). Rather, if
only the old interfaces are available, don't call them for devices on
non-zero segments at all.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
[v1: Edited git description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] kvm: extension capability for new address space layout
[S390] kvm: fix address mode switching
* 'for-linus' of git://git.kernel.dk/linux-block:
floppy: use del_timer_sync() in init cleanup
blk-cgroup: be able to remove the record of unplugged device
block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request
mm: Add comment explaining task state setting in bdi_forker_thread()
mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread()
block: simplify force plug flush code a little bit
block: change force plug flush call order
block: Fix queue_flag update when rq_affinity goes from 2 to 1
block: separate priority boosting from REQ_META
block: remove READ_META and WRITE_META
xen-blkback: fixed indentation and comments
xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
Add management interface events for blocking/unblocking a device.
Sender of the block device command gets cmd complete and other
mgmt sockets get the event. Event is also sent to mgmt sockets when
blocking is done with ioctl, e.g when blocking a device with
hciconfig. This makes it possible for bluetoothd to track status
of blocked devices when a third party block or unblocks a device.
Event sending is handled in mgmt_device_blocked function which gets
called from hci_blacklist_add in hci_core.c. A pending command is
added in mgmt_block_device, so that it can found when sending the
event - the event is not sent to the socket from which the pending
command came. Locks were moved out from hci_core.c to hci_sock.c
and mgmt.c, because locking is needed also for mgmt_pending_add in
mgmt.c.
Signed-off-by: Antti Julku <antti.julku@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The function crypto_blkcipher_setkey() called by smp_e()
can sleep, so all the crypto work has to be moved to
hci_dev workqueue.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The objective is to make the core to have as little as possible
information about SMP procedures and logic. Now, all the SMP
specific information is hidden from the core.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Add command to management interface for enabling/disabling the
fast connectable mode.
Signed-off-by: Antti Julku <antti.julku@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
One piece of information that was lost when using the mgmt interface,
was the type of the connection. Using HCI events we used to know
the type of the connection based on the type of the event, e.g.
HCI_LE_Connection_Complete for LE links.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Add HCI_CONN_LE_SMP_PEND flag to indicate that SMP is pending
for that connection. This allows to have information that an SMP
procedure is going on for that connection.
We use the HCI_CONN_ENCRYPT_PEND to indicate that encryption
(HCI_LE_Start_Encryption) is pending for that connection.
While a SMP procedure is going on we hold an reference to the
connection, to avoid disconnections.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This checks if there is any existing connection according to its type
before start iterating in the list and immediately stop iterating when
reaching the number of connections.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The parameter's origin type is long. On an i386 architecture, it can
easily be larger than 0x80000000, causing this function to convert it
to a sign-extended u64 type.
Change the type to unsigned long so we get the correct result.
Signed-off-by: hank <pyu@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <stable@kernel.org>
[ build fix ]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On the platforms which are x2apic and interrupt-remapping
capable, Linux kernel is enabling x2apic even if the BIOS
doesn't. This is to take advantage of the features that x2apic
brings in.
Some of the OEM platforms are running into issues because of
this, as their bios is not x2apic aware. For example, this was
resulting in interrupt migration issues on one of the platforms.
Also if the BIOS SMI handling uses APIC interface to send SMI's,
then the BIOS need to be aware of x2apic mode that OS has
enabled.
On some of these platforms, BIOS doesn't have a HW mechanism to
turnoff the x2apic feature to prevent OS from enabling it.
To resolve this mess, recent changes to the VT-d2 specification:
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf
includes a mechanism that provides BIOS a way to request system
software to opt out of enabling x2apic mode.
Look at the x2apic optout flag in the DMAR tables before
enabling the x2apic mode in the platform. Also print a warning
that we have disabled x2apic based on the BIOS request.
Kernel boot parameter "intremap=no_x2apic_optout" can be used to
override the BIOS x2apic optout request.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.171766616@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
hex2bin converts a hexadecimal string to its binary representation.
The original version of hex2bin did not do any error checking. This
patch adds error checking and returns the result.
Changelog v1:
- removed unpack_hex_byte()
- changed return code from boolean to int
Changelog:
- use the new unpack_hex_byte()
- add __must_check compiler option (Andy Shevchenko's suggestion)
- change function API to return error checking result
(based on Tetsuo Handa's initial patch)
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Add IP6_TNL_F_USE_ORIG_FWMARK to ip6_tunnel, so that ip6_tnl_xmit2()
makes a route lookup taking into account skb->fwmark and doesnt cache
lookup result.
This permits more flexibility in policies and firewall setups.
To setup such a tunnel, "fwmark inherit" option should be added to "ip
-f inet6 tunnel" command.
Reported-by: Anders Franzen <Anders.Franzen@ericsson.com>
CC: Hans Schillström <hans.schillstrom@ericsson.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NFC Controller Interface (NCI) is a standard
communication protocol between an NFC Controller (NFCC)
and a Device Host (DH), defined by the NFC Forum.
Signed-off-by: Ilan Elias <ilane@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The file nfc.h was moved from include/net to include/net/nfc,
since new NFC header files will be added to include/net/nfc.
Signed-off-by: Ilan Elias <ilane@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add 2 new nfc control operations:
dev_up to turn on the nfc device
dev_down to turn off the nfc device
Signed-off-by: Ilan Elias <ilane@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
598841ca99 ([S390] use gmap address
spaces for kvm guest images) changed kvm on s390 to use a separate
address space for kvm guests. We can now put KVM guests anywhere
in the user address mode with a size up to 8PB - as long as the
memory is 1MB-aligned. This change was done without KVM extension
capability bit.
The change was added after 3.0, but we still have a chance to add
a feature bit before 3.1 (keeping the releases in a sane state).
We use number 71 to avoid collisions with other pending kvm patches
as requested by Alexander Graf.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
In systems where there is no hardware signal from the processor to the
PMIC to initiate the final power off sequence we must initiate the
shutdown with a register write to the PMIC. Support such systems in the
driver. Since this may prevent a full shutdown of the system platform
data is used to enable the feature.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the driver (or most likely firmware) decides which AP to use
for roaming based on internal scan result processing, user space
needs to be notified of PMKSA caching candidates to allow RSN
pre-authentication to be used.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add function to find vendor-specific ie (along with
vendor-specific ie struct definition and P2P OUI values)
Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds support for LZO compression when storing the register
cache.
For a typical device whose register map would normally occupy 25kB or 50kB
by using the LZO compression technique, one can get down to ~5-7kB. There
might be a performance penalty associated with each individual read/write
due to decompressing/compressing the underlying cache, however that should not
be noticeable. These memory benefits depend on whether the target architecture
can get rid of the memory occupied by the original register defaults cache
which is marked as __devinitconst. Nevertheless there will be some memory
gain even if the target architecture can't get rid of the original register
map, this should be around ~30-32kB instead of 50kB.
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This patch adds support for the rbtree cache compression type.
Each rbnode manages a variable length block of registers. There can be no
two nodes with overlapping blocks. Each block has a base register and a
currently top register, all the other registers, if any, lie in between these
two and in ascending order.
The reasoning behind the construction of this rbtree is simple. In the
snd_soc_rbtree_cache_init() function, we iterate over the register defaults
provided by the regcache core. For each register value that is non-zero we
insert it in the rbtree. In order to determine in which rbnode we need
to add the register, we first look if there is another register already
added that is adjacent to the one we are about to add. If that is the case
we append it in that rbnode block, otherwise we create a new rbnode
with a single register in its block and add it to the tree.
There are various optimizations across the implementation to speed up lookups
by caching the most recently used rbnode.
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Tested-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This is the simplest form of a cache available in regcache. Any
registers whose default value is 0 are ignored. If any of those
registers are modified in the future, they will be placed in the
cache on demand. The cache layout is essentially using the provided
register defaults by the regcache core directly and does not re-map
it to another representation.
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This patch introduces caching support for regmap. The regcache API
has evolved essentially out of ASoC soc-cache so most of the actual
caching types (except LZO) have been tested in the past.
The purpose of regcache is to optimize in time and space the handling
of register caches. Time optimization is achieved by not having to go
over a slow bus like I2C to read the value of a register, instead it is
cached locally in memory and can be retrieved faster. Regarding space
optimization, some of the cache types are better at packing the caches,
for e.g. the rbtree and the LZO caches. By doing this the sacrifice in
time still wins over doing I2C transactions.
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Tested-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
When debugging tight race conditions, it can be helpful to have a
synchronized tracing method. Although in most cases the global clock
provides this functionality, if timings is not the issue, it is more
comforting to know that the order of events really happened in a precise
order.
Instead of using a clock, add a "counter" that is simply an incrementing
atomic 64bit counter that orders the events as they are perceived to
happen.
The trace_clock_counter() is added from the attempt by Peter Zijlstra
trying to convert the trace_clock_global() to it. I took Peter's counter
code and made trace_clock_counter() instead, and added it to the choice
of clocks. Just echo counter > /debug/tracing/trace_clock to activate
it.
Requested-by: Thomas Gleixner <tglx@linutronix.de>
Requested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
commit 946cedccbd (tcp: Change possible SYN flooding messages)
added a build error if CONFIG_SYN_COOKIES=n
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6:
mfd: Fix omap-usb-host build failure
mfd: Make omap-usb-host TLL mode work again
mfd: Set MAX8997 irq pointer
mfd: Fix initialisation of tps65910 interrupts
mfd: Check for twl4030-madc NULL pointer
mfd: Copy the device pointer to the twl4030-madc structure
mfd: Rename wm8350 static gpio_set_debounce()
mfd: Fix value of WM8994_CONFIGURE_GPIO
* git://github.com/davem330/net: (62 commits)
ipv6: don't use inetpeer to store metrics for routes.
can: ti_hecc: include linux/io.h
IRDA: Fix global type conflicts in net/irda/irsysctl.c v2
net: Handle different key sizes between address families in flow cache
net: Align AF-specific flowi structs to long
ipv4: Fix fib_info->fib_metrics leak
caif: fix a potential NULL dereference
sctp: deal with multiple COOKIE_ECHO chunks
ibmveth: Fix checksum offload failure handling
ibmveth: Checksum offload is always disabled
ibmveth: Fix issue with DMA mapping failure
ibmveth: Fix DMA unmap error
pch_gbe: support ML7831 IOH
pch_gbe: added the process of FIFO over run error
pch_gbe: fixed the issue which receives an unnecessary packet.
sfc: Use 64-bit writes for TX push where possible
Revert "sfc: Use write-combining to reduce TX latency" and follow-ups
bnx2x: Fix ethtool advertisement
bnx2x: Fix 578xx link LED
bnx2x: Fix XMAC loopback test
...
In a few places in the kernel, the code prints
a human-readable USB device speed (eg. "high speed").
This involves a switch statement sometimes wrapped
around in ({ ... }) block leading to code repetition.
To mitigate this issue, this commit introduces
usb_speed_string() function, which returns
a human-readable name of provided speed.
It also changes a few places switch was used to use
this new function. This changes a bit the way the
speed is printed in few instances at the same time
standardising it.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
tcp_md5sig_pool is currently an 'array' (a percpu object) of pointers to
struct tcp_md5sig_pool. Only the pointers are NUMA aware, but objects
themselves are all allocated on a single node.
Remove this extra indirection to get proper percpu memory (NUMA aware)
and make code simpler.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 0856a30409.
As requested by Eric Dumazet, it has various ref-counting
problems and has introduced regressions. Eric will add
a more suitable version of this performance fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
A user-space process must use ETHTOOL_GRXCLSRLCNT to find the number
of classification rules, then allocate a buffer of the right size,
then use ETHTOOL_GRXCLSRLALL to fill the buffer. If some other
process inserts or deletes a rule between those two operations,
the user buffer might turn out to be the wrong size.
If it's too small, the return value will be -EMSGSIZE. But if it's
too large, there is no indication of this. Fix this by updating
the rule_cnt field on return.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct the description of ethtool_rxnfc::rule_locs; it is an array
of currently used locations, not all possible valid locations.
Add note that drivers must not use ethtool_rxnfc::rule_locs.
The rule_locs argument to ethtool_ops::get_rxnfc is either NULL or a
pointer to an array of u32, so change the parameter type accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The location of an RX flow classification rule is needed to identify
it for retrieval, replacement or deletion. However it also defines
the priority of the rule in the case that a flow is matched by
multiple rules. This is what I intended to imply by referring to the
use of a TCAM, commonly used to implement that behaviour.
However there are other ways this can be done, and it is better to
specify this explicitly. Further, I want to add the option for
automatic selection of rule locations.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refer consistently to 'classification rules' or just 'rules' rather
than 'filter specifications' or 'filter rules'.
Refer consistently to rule 'locations' and not 'indices'.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is compile tested only.
Suggested by dumpster diving in PAX.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF-specific flowi structs are now passed to flow_key_compare, which must
also be aligned to a long.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a CAN Gateway/Router to route (and modify) CAN frames.
It is based on the PF_CAN core infrastructure for msg filtering and msg
sending and can optionally modify routed CAN frames on the fly.
CAN frames can *only* be routed between CAN network interfaces (one hop).
They can be modified with AND/OR/XOR/SET operations as configured by the
netlink configuration interface known e.g. from iptables. From the netlink
view this can-gw implements RTM_{NEW|DEL|GET}ROUTE for PF_CAN.
The CAN specific userspace tool to manage CAN routing entries can be found in
the CAN utils http://svn.berlios.de/wsvn/socketcan/trunk/can-utils/cangw.c
at the SocketCAN SVN on BerliOS.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attempt to reduce the number of IP packets emitted in response to single
SCTP packet (2e3216cd) introduced a complication - if a packet contains
two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the
socket while processing first COOKIE_ECHO and then loses the association
and forgets to uncork the socket. To deal with the issue add new SCTP
command which can be used to set association explictly. Use this new
command when processing second COOKIE_ECHO chunk to restore the context
for SCTP state machine.
Signed-off-by: Max Matveev <makc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch does several things:
- introduces __ethtool_get_settings which is called from ethtool code and
from drivers as well. Put ASSERT_RTNL there.
- dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
- changes calling in drivers so rtnl locking is respected. In
iboe_get_rate was previously ->get_settings() called unlocked. This
fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
problem. Also fixed by calling __dev_get_by_index() instead of
dev_get_by_index() and holding rtnl_lock for both calls.
- introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
so bnx2fc_if_create() and fcoe_if_create() are called locked as they
are from other places.
- use __ethtool_get_settings() in bonding code
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
v2->v3:
-removed dev_ethtool_get_settings()
-added ASSERT_RTNL into __ethtool_get_settings()
-prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
around it and __ethtool_get_settings() call
v1->v2:
add missing export_symbol
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits]
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are references to this PHY chip in the generic mii.h header, so removing them.
Re-jiggle the changed comments, in response to points raised by Ben Hutchings.
Signed-off-by: Mark Einon <mark.einon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whitespace changes - spaces converted to tabs after each define name and value
Signed-off-by: Mark Einon <mark.einon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_forward_skb loops an skb back into host networking
stack which might hang on the memory indefinitely.
In particular, this can happen in macvtap in bridged mode.
Copy the userspace fragments to avoid blocking the
sender in that case.
As this patch makes skb_copy_ubufs extern now,
I also added some documentation and made it clear
the SKBTX_DEV_ZEROCOPY flag automatically instead
of doing it in all callers. This can be made into a separate
patch if people feel it's worth it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"Possible SYN flooding on port xxxx " messages can fill logs on servers.
Change logic to log the message only once per listener, and add two new
SNMP counters to track :
TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client
TCPReqQFullDrop : number of times a SYN request was dropped because
syncookies were not enabled.
Based on a prior patch from Tom Herbert, and suggestions from David.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was pointed out by 'make versioncheck' that some includes of
linux/version.h are not needed in include/.
This patch removes them.
When I last posted the patch, the ceph bit was ACK'ed by Sage Weil, so
I've added that below.
The pwc-ioctl change generated quite a bit of discussion about V4L version
numbers in general, but as far as I can tell, no concensus was reached on
what the long term solution should be, so in the mean time I think we
could start by just removing the unneeded include, which is why I'm
resending the patch with that hunk still included.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Sage Weil <sage@newdream.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Building a kernel with hotplug disabled results in a link failure:
`bgpio_remove' referenced in section `___ksymtab_gpl+bgpio_remove' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o
This is because of bgpio_remove() is exported. It is illegal to export
symbols which are discarded either at link time or as part of an
init/exit section.
Fix this by dropping the __devexit attributation from bgpio_remove().
Also drop the __devinit attributation from bgpio_init().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert the post-3.0 commit 82f9d486e5 ("memcg: add
memory.vmscan_stat").
The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.
The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between. Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.
Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before permitting 'security.evm' to be updated, 'security.evm' must
exist and be valid. In the case that there are no existing EVM protected
xattrs, it is safe for posix acls to update the mode bits.
To differentiate between no 'security.evm' xattr and no xattrs used to
calculate 'security.evm', this patch defines INTEGRITY_NOXATTR.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
The posix xattr acls are 'system' prefixed, which normally would not
affect security.evm. An interesting side affect of writing posix xattr
acls is their modifying of the i_mode, which is included in security.evm.
This patch updates security.evm when posix xattr acls are written.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Fix kernel-doc warning in net/cfg80211.h:
Warning(include/net/cfg80211.h:1884): No description found for parameter 'registered'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Per sec 7.1.3.5 of draft 12.0 of 802.11s, mesh frames indicate the
presence of the mesh control header in their QoS header.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Set SSID information from nl80211 beacon parameters. Advertise changes
in SSID to low level drivers.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To properly maintain the peer's block ack window, the driver needs to be
able to control the new starting sequence number that is sent along with
the BlockAckReq frame.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For ipv6 link-local addresses, sunrpc do not compare those scope id.
This patch let sunrpc compares scope id only on link-local addresses.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
For IPv6 local address, lockd can not callback to client for
missing scope id when binding address at inet6_bind:
324 if (addr_type & IPV6_ADDR_LINKLOCAL) {
325 if (addr_len >= sizeof(struct sockaddr_in6) &&
326 addr->sin6_scope_id) {
327 /* Override any existing binding, if another one
328 * is supplied by user.
329 */
330 sk->sk_bound_dev_if = addr->sin6_scope_id;
331 }
332
333 /* Binding to link-local address requires an interface */
334 if (!sk->sk_bound_dev_if) {
335 err = -EINVAL;
336 goto out_unlock;
337 }
Replacing svc_addr_u by sockaddr_storage, let rqstp->rq_daddr contains more info
besides address.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There are no more users...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The current code is sort of hackish in that it assumes a referral is always
matched to an export. When we add support for junctions that may not be the
case.
We can replace nfsd4_path() with a function that encodes the components
directly from the dentries. Since nfsd4_path is currently the only user of
the 'ex_pathname' field in struct svc_export, this has the added benefit
of allowing us to get rid of that.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Introduce filtering for scheduled scans to reduce the number of
unnecessary results (which cause useless wake-ups).
Add a new nested attribute where sets of parameters to be matched can
be passed when starting a scheduled scan. Only scan results that
match any of the sets will be returned.
At this point, the set consists of a single parameter, an SSID. This
can be easily extended in the future to support more complex matches.
Signed-off-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
add WIPHY_FLAG_AP_UAPSD flag to indicate uapsd support on
AP mode.
Advertise it to userspace by including a new
NL80211_ATTR_SUPPORT_AP_UAPSD attribute.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When this flag is set, Tx A-MPDU sessions will not be started by
mac80211. This flag is required for devices that support Tx A-MPDU setup
in hardware.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the rssi of the current AP drops, both wpa_supplicant and the
firmware may do a background scan to find a better AP and try to
associate. Since firmware based roaming is faster, inform
wpa_supplicant to avoid roaming and let the firmware decide to
roam if necessary.
For fullmac drivers like ath6kl, it is just enough to provide the
ESSID and the firmware will decide on the BSSID. Since it is not
possible to do pre-auth during roaming for fullmac drivers, the
wpa_supplicant needs to completely disconnect with the old AP and
reconnect with the new AP. This consumes lot of time and it is
better to leave the roaming decision to the firmware.
Signed-off-by: Vivek Natarajan <nataraja@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Specs say about size 2 (u16) and my 14e4:4727 has board rev 0x1211.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The qi->q_lock lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The iommu->register_lock can be taken in atomic context and therefore
must not be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The oprofilefs_lock can be taken in atomic context (in profiling
interrupts) and therefore cannot cannot be preempted on -rt -
annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no reason to allow the lock protecting rwsems (the
ownerless variant) to be preemptible on -rt. Convert it to raw.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no reason to have the spin_lock protecting the semaphore
preemptible on -rt. Annotate it as a raw_spinlock.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
( On rt this also solves lockdep complaining about the
rt_mutex.wait_lock being not initialized. )
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The thread_group_cputimer lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The logbuf_lock lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ merged and fixed it ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The prop_local_percpu::lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The percpu_counter::lock can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The kprobe locks can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.
In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some irq chips need the irq_set_wake() functionality, but do not
require a irq_set_wake() callback. Instead of forcing an empty
callback to be implemented add a flag which notes this fact. Check for
the flag in set_irq_wake_real() and return success when set.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Thomas Gleixner <tglx@linutronix.de>