The recursive nature of kernfs_remove() means that, even if
kernfs_remove() is not allowed to be called multiple times on the same
node, there may be race conditions between removal of parent and its
descendants. While we can claim that kernfs_remove() shouldn't be
called on one of the descendants while the removal of an ancestor is
in progress, such rule is unnecessarily restrictive and very difficult
to enforce. It's better to simply allow invoking kernfs_remove() as
the caller sees fit as long as the caller ensures that the node is
accessible.
The current behavior in such situations is broken. Whoever enters
removal path first takes the node off the hierarchy and then
deactivates. Following removers either return as soon as it notices
that it's not the first one or can't even find the target node as it
has already been removed from the hierarchy. In both cases, the
following removers may finish prematurely while the nodes which should
be removed and drained are still being processed by the first one.
This patch restructures so that multiple removers, whether through
recursion or direction invocation, always follow the following rules.
* When there are multiple concurrent removers, only one puts the base
ref.
* Regardless of which one puts the base ref, all removers are blocked
until the target node is fully deactivated and removed.
To achieve the above, removal path now first deactivates the subtree,
drains it and then unlinks one-by-one. __kernfs_deactivate() is
called directly from __kernfs_removal() and drops and regrabs
kernfs_mutex for each descendant to drain active refs. As this means
that multiple removers can enter __kernfs_deactivate() for the same
node, the function is updated so that it can handle multiple
deactivators of the same node - only one actually deactivates but all
wait till drain completion.
The restructured removal path guarantees that a removed node gets
unlinked only after the node is deactivated and drained. Combined
with proper multiple deactivator handling, this guarantees that any
invocation of kernfs_remove() returns only after the node itself and
all its descendants are deactivated, drained and removed.
v2: Draining separated into a separate loop (used to be in the same
loop as unlink) and done from __kernfs_deactivate(). This is to
allow exposing deactivation as a separate interface later.
Root node removal was broken in v1 patch. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
KERNFS_REMOVED is used to mark half-initialized and dying nodes so
that they don't show up in lookups and deny adding new nodes under or
renaming it; however, its role overlaps those of deactivation and
removal from rbtree.
It's necessary to deny addition of new children while removal is in
progress; however, this role considerably intersects with deactivation
- KERNFS_REMOVED prevents new children while deactivation prevents new
file operations. There's no reason to have them separate making
things more complex than necessary.
KERNFS_REMOVED is also used to decide whether a node is still visible
to vfs layer, which is rather redundant as equivalent determination
can be made by testing whether the node is on its parent's children
rbtree or not.
This patch removes KERNFS_REMOVED.
* Instead of KERNFS_REMOVED, each node now starts its life
deactivated. This means that we now use both atomic_add() and
atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN. The compiler
generates an overflow warnings when negating INT_MIN as the negation
can't be represented as a positive number. Nothing is actually
broken but let's bump BIAS by one to avoid the warnings for archs
which negates the subtrahend..
* KERNFS_REMOVED tests in add and rename paths are replaced with
kernfs_get/put_active() of the target nodes. Due to the way the add
path is structured now, active ref handling is done in the callers
of kernfs_add_one(). This will be consolidated up later.
* kernfs_remove_one() is updated to deactivate instead of setting
KERNFS_REMOVED. This removes deactivation from kernfs_deactivate(),
which is now renamed to kernfs_drain().
* kernfs_dop_revalidate() now tests RB_EMPTY_NODE(&kn->rb) instead of
KERNFS_REMOVED and KERNFS_REMOVED test in kernfs_dir_pos() is
dropped. A node which is removed from the children rbtree is not
included in the iteration in the first place. This means that a
node may be visible through vfs a bit longer - it's now also visible
after deactivation until the actual removal. This slightly enlarged
window difference doesn't make any difference to the userland.
* Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
checks on the active ref.
* Some comment style updates in the affected area.
v2: Reordered before removal path restructuring. kernfs_active()
dropped and kernfs_get/put_active() used instead. RB_EMPTY_NODE()
used in the lookup paths.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There currently are two mechanisms gating active ref lockdep
annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask.
The former disables lockdep annotations in kernfs_get/put_active()
while the latter disables all of kernfs_deactivate().
While KERNFS_ACTIVE_REF also behaves as an optimization to skip the
deactivation step for non-file nodes, the benefit is marginal and it
needlessly diverges code paths. Let's drop KERNFS_ACTIVE_REF and use
KERNFS_LOCKDEP in kernfs_deactivate() too.
While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP
flag so that it's more convenient and the related code can be compiled
out when not enabled.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernfs_node->u.completion is used to notify deactivation completion
from kernfs_put_active() to kernfs_deactivate(). We now allow
multiple racing removals of the same node and the current removal
scheme is no longer correct - kernfs_remove() invocation may return
before the node is properly deactivated if it races against another
removal. The removal path will be restructured to address the issue.
To help such restructure which requires supporting multiple waiters,
this patch replaces kernfs_node->u.completion with
kernfs_root->deactivate_waitq. This makes deactivation event
notifications share a per-root waitqueue_head; however, the wait path
is quite cold and this will also allow shaving one pointer off
kernfs_node.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* pci/resource:
PCI: Allocate 64-bit BARs above 4G when possible
PCI: Enforce bus address limits in resource allocation
PCI: Split out bridge window override of minimum allocation address
agp/ati: Use PCI_COMMAND instead of hard-coded 4
agp/intel: Use CPU physical address, not bus address, for ioremap()
agp/intel: Use pci_bus_address() to get GTTADR bus address
agp/intel: Use pci_bus_address() to get MMADR bus address
agp/intel: Support 64-bit GMADR
agp/intel: Rename gtt_bus_addr to gtt_phys_addr
drm/i915: Rename gtt_bus_addr to gtt_phys_addr
agp: Use pci_resource_start() to get CPU physical address for BAR
agp: Support 64-bit APBASE
PCI: Add pci_bus_address() to get bus address of a BAR
PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev
PCI: Change pci_bus_region addresses to dma_addr_t
My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.
This reverts parts of c320b976d7 ("PCI: Add implementation for PRI
capability"), removing these interfaces:
pci_pri_enabled()
pci_pri_stopped()
pci_pri_status()
[bhelgaas: split to separate patch]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Joerg Roedel <joro@8bytes.org>
John W. Linville says:
====================
Please pull these updates for the 3.14 stream!
For the mac80211 bits, Johannes says:
"Felix adds some helper functions for P2P NoA software tracking, Joe
fixes alignment (but as this apparently never caused issues I didn't
send it to 3.13), Kyeyoon/Jouni add QoS-mapping support (a Hotspot 2.0
feature), Weilong fixed a bunch of checkpatch errors and I get to play
fire-fighter or so and clean up other people's locking issues. I also
added nl80211 vendor-specific events, as we'd discussed at the wireless
summit."
For the iwlwifi bits, Emmanuel says:
"I have here a rework of the interrupt handling to meet RT kernel
requirements - basically we don't take any lock in the primary interrupt
handler. This gave me a good reason to clean things up a bit on the way.
There is also a fix of the QoS mapping along with a few workarounds for
hardware / firmware issues that are hard to hit.
Three fixes suggested by static analyzers, and other various stuff.
Most importantly, I update the Copyright note to include the new year."
For the bluetooth bits, Gustavo says:
"More patches to 3.14. The bulk of changes here is the 6LoWPAN support for
Bluetooth LE Devices. The commits that touches net/ieee802154/ are already
acked by David Miller. Other than that we have some RFCOMM fixes and
improvements plus fixes and clean ups all over the tree."
Beyond that, ath9k, brcmfmac, mwifiex, and wil6210 get their usual
level of attention. The wl1251 driver gets a number of updates,
and there are a handful of other bits here and there.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
This batch contains one single patch with the l2tp match
for xtables, from James Chapman.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:
- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
instead of lower device which misses the necessary txq synchronization for
lower device such as txq stopping or frozen required by dev watchdog or
control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
when tso is disabled for lower device.
Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.
With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.
In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a uAPSD service period ends with an MMPDU, we currently just
send that MMPDU, but it obviously won't get the EOSP bit set as
it doesn't have a QoS header. This contradicts the standard, so
add a QoS-nulldata frame after the MMPDU to properly terminate
the service period with a frame that has EOSP set.
Also fix a bug wrt. the TID for the MMPDU, it shouldn't be set
to 0 unconditionally but use the actual TID that was assigned.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Seventh bit of 8th byte of extended capabilities specifies wide
bandwidth support for TDLS links. Add this definition to ieee80211.
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch converts the temporary buffer in spc_emulate_inquiry() to
use dynamically allocated memory, instead of local stack memory.
Also bump SE_INQUIRY_BUF up to 1024 bytes to be safe when handling
multiple large SCSI name descriptors for EVPD=0x83.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Add infrastructure for referrals.
v2 changes:
- Fix unsigned long long division in core_alua_state_lba_dependent on
32-bit (Fengguang + Chen + Hannes)
- Fix compile warning in core_alua_state_lba_dependent (nab)
- Convert segment_* + sectors variables in core_alua_state_lba_dependent
to u64 (Hannes)
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pablo Neira Ayuso says:
====================
nf_tables updates for net-next
The following patchset contains the following nf_tables updates,
mostly updates from Patrick McHardy, they are:
* Add the "inet" table and filter chain type for this new netfilter
family: NFPROTO_INET. This special table/chain allows IPv4 and IPv6
rules, this should help to simplify the burden in the administration
of dual stack firewalls. This also includes several patches to prepare
the infrastructure for this new table and a new meta extension to
match the layer 3 and 4 protocol numbers, from Patrick McHardy.
* Load both IPv4 and IPv6 conntrack modules in nft_ct if the rule is used
in NFPROTO_INET, as we don't certainly know which one would be used,
also from Patrick McHardy.
* Do not allow to delete a table that contains sets, otherwise these
sets become orphan, from Patrick McHardy.
* Hold a reference to the corresponding nf_tables family module when
creating a table of that family type, to avoid the module deletion
when in use, from Patrick McHardy.
* Update chain counters before setting the chain policy to ensure that
we don't leave the chain in inconsistent state in case of errors (aka.
restore chain atomicity). This also fixes a possible leak if it fails
to allocate the chain counters if no counters are passed to be restored,
from Patrick McHardy.
* Don't check for overflows in the table counter if we are just renaming
a chain, from Patrick McHardy.
* Replay the netlink request after dropping the nfnl lock to load the
module that supports provides a chain type, from Patrick.
* Fix chain type module references, from Patrick.
* Several cleanups, function renames, constification and code
refactorizations also from Patrick McHardy.
* Add support to set the connmark, this can be used to set it based on
the meta mark (similar feature to -j CONNMARK --restore), from
Kristian Evensen.
* A couple of fixes to the recently added meta/set support and nft_reject,
and fix missing chain type unregistration if we fail to register our
the family table/filter chain type, from myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The event trigger code that checks for callback triggers before and
after recording of an event has lots of flags checks. This code is
duplicated throughout the ftrace events, kprobes and system calls.
They all do the exact same checks against the event flags.
Added helper functions ftrace_trigger_soft_disabled(),
event_trigger_unlock_commit() and event_trigger_unlock_commit_regs()
that consolidated the code and these are used instead.
Link: http://lkml.kernel.org/r/20140106222703.5e7dbba2@gandalf.local.home
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
For a relocatable kernel since it can be loaded at any place, there
is no any relation between the kernel start addr and the memstart_addr.
So we can't calculate the memstart_addr from kernel start addr. And
also we can't wait to do the relocation after we get the real
memstart_addr from device tree because it is so late. So introduce
a new function we can use to get the first memblock address and size
in a very early stage (before machine_init).
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
The Nomadik I2C is now configured from the device tree on all platforms
using this controller. Delete the platform data header and move the
definitions into the driver so it is all contained in one single file.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The Nomadik I2C controller needs to have the slave set-up time
configured based off the clock used to drive the I2C bus block.
Currently this is done with static assignments assuming that the
block is clocked 48MHz which is pretty likely to be bug-prone.
Calculate the SLSU from the equation given in the datasheet
instead.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Introduce an xtables add-on for matching L2TP packets. Supports L2TPv2
and L2TPv3 over IPv4 and IPv6. As well as filtering on L2TP tunnel-id
and session-id, the filtering decision can also include the L2TP
packet type (control or data), protocol version (2 or 3) and
encapsulation type (UDP or IP).
The most common use for this will likely be to filter L2TP data
packets of individual L2TP tunnels or sessions. While a u32 match can
be used, the L2TP protocol headers are such that field offsets differ
depending on bits set in the header, making rules for matching generic
L2TP connections cumbersome. This match extension takes care of all
that.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We don't encode argument types into function names and since besides
nft_do_chain() there are only AF-specific versions, there is no risk
of confusion.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Minor nf_chain_type cleanups:
- reorder struct to plug a hoe
- rename struct module member to "owner" for consistency
- rename nf_hookfn array to "hooks" for consistency
- reorder initializers for better readability
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The chain type module reference handling makes no sense at all: we take
a reference immediately when the module is registered, preventing the
module from ever being unloaded.
Fix by taking a reference when we're actually creating a chain of the
chain type and release the reference when destroying the chain.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds kernel support for setting properties of tracked
connections. Currently, only connmark is supported. One use-case
for this feature is to provide the same functionality as
-j CONNMARK --save-mark in iptables.
Some restructuring was needed to implement the set op. The new
structure follows that of nft_meta.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These sentences are not proper English due to the typos. I had initially caught
one of them while trying to understand the regmap feature, and then I just ran
the vim spell checker, and went through all the items that would need to be
fixed for this header file.
Signed-off-by: Laszlo Papp <lpapp@kde.org>
Signed-off-by: Mark Brown <broonie@linaro.org>
Add a utility function to get the number of channels supported by
the device, and update the places in the code that need this data.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
[replace another occurrence in libertas, fix kernel-doc, fix bugs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Simplify vt-d related code with existing macros and introduce a new
macro for_each_active_drhd_unit() to enumerate all active DRHD unit.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Functions alloc_iommu() and parse_ioapics_under_ir()
are only used internally, so mark them as static.
[Joerg: Made detect_intel_iommu() non-static again for IA64]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Function dmar_parse_dev_scope() should release the PCI device reference
count gained in function dmar_parse_one_dev_scope() on error recovery,
otherwise it will cause PCI device object leakage.
This patch also introduces dmar_free_dev_scope(), which will be used
to support DMAR device hotplug.
Reviewed-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
* qcom/drivers:
tty: serial: Limit msm_serial_hs driver to platforms that use it
mmc: msm_sdcc: Limit driver to platforms that use it
usb: phy: msm: Move mach dependent code to platform data
Signed-off-by: Olof Johansson <olof@lixom.net>
This patch fix compilation error when driver is compiled
in multi-platform builds.
drivers/built-in.o: In function `msm_otg_link_clk_reset':
./drivers/usb/phy/phy-msm-usb.c:314: undefined reference to `clk_reset'
./drivers/usb/phy/phy-msm-usb.c:318: undefined reference to `clk_reset'
Use platform data supplied reset handlers and adjust error
messages reported when reset sequence fail.
This is an intermediate step before adding support for reset
framework and newer targets.
Signed-off-by: Ivan T. Ivanov <iivanov@mm-sol.com>
Acked-by: David Brown <davidb@codeaurora.org>
Cc: Daniel Walker <dwalker@fifo99.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Nowadays we have CMA for obtaining the contiguous memory pages
efficiently. Let's kill the old kludge for reserving the memory pages
for large buffers. It was rarely useful (only for preserving pages
among module reloading or a little help by an early boot scripting),
used only by a couple of drivers, and yet it gives too much ugliness
than its benefit.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
From Simon Horman:
Renesas ARM Based SoC DT Fixes for v3.14
Revert the addition of SSI clocks to DT for the
r8a7790 (R-Car H2) and r8a7791 (R-Car M2) SoCs.
Unfortunately these patches prevent booting the
r8a7790-based Lager board and r8a7791-based Koelsch board
to the point where a serial output is available.
A solution to this problem is being sought but has not
yet been finalised so in the mean time revert the changes.
* tag 'renesas-dt-fixes-for-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
Revert "ARM: shmobile: r8a7791: Add SSI clocks in device tree"
Revert "ARM: shmobile: r8a7790: Add SSI clocks in device tree"
Signed-off-by: Olof Johansson <olof@lixom.net>
Chanwoo writes:
Update extcon for v3.14
This patchset add new driver of extcon-max14577.c which detect various external
connector and fix minor issue of extcon provider driver(extcon-arizona/palams/
gpio.c). Also, update documentation of previous 'switch' porting guide and
extcon git repository url.
Detailed description for patchset:
- New driver of extcon-max14577.c
: Add extcon-max14577.c drvier to support Maxim MUIC(Micro USB Interface
Controller) which detect USB/TA/JIG/AUDIO-DOCK and additional accessory
according to each resistance when connected external connector.
- extcon-arizoan.c driver
: Code clean to use define macro instead of hex value
: Fix minor issue to reset back to our staring state
: Fix race with microphone detection and removal
- extcon-palmas.c driver
: Fix minor issue and renaming compatible string of Devicetree
- extcon-gpio.c driver
: Fix bug about ordering initialization of gpio pin on probe()
: Send uevent after wakeup from suspend state because some SoC
haven't wakeup interrupt on suspend state.
- Documentation (Documentation/extcon/porting-android-switch-class)
: Fix switch class porting guide
- Update extcon git repository url
Now that we've got code for raid5/6 stripe awareness, bcache just needs
to know about the stripes and when writing partial stripes is expensive
- we probably don't want to enable this optimization for raid1 or 10,
even though they have stripes. So add a flag to queue_limits.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
We need a reserve for allocating buckets for new btree nodes - and now that
we've got multiple btrees, it really needs to be per btree.
This reworks the reserves so we've got separate freelists for each reserve
instead of watermarks, which seems to make things a bit cleaner, and it adds
some code so that btree_split() can make sure the reserve is available before it
starts.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>