When searching for an existing socket to reuse, the address family
is not taken into account - only port number. This means that an
IPv4 socket could be used for IPv6 traffic and vice versa, which
is sure to cause problems when passing packets.
It is not possible to trigger this problem currently because the
only user of Geneve creates just IPv4 sockets. However, that is
likely to change in the near future.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hash table for open Geneve ports is used only on creation and
deletion time. It is not performance critical and is not likely to
grow to a large number of items. Therefore, this can be changed
to use a simple linked list.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing Geneve locking scheme was pulled over directly from
VXLAN. However, VXLAN has a number of built in mechanisms which make
the locking more complex and are unlikely to be necessary with Geneve.
This simplifies the locking to use a basic scheme of a mutex
when doing updates plus RCU on receive.
In addition to making the code easier to read, this also avoids the
possibility of a race when creating or destroying sockets since
UDP sockets and the list of Geneve sockets are protected by different
locks. After this change, the entire operation is atomic.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The work queue is used only to free the UDP socket upon destruction.
This is not necessary with Geneve and generally makes the code more
difficult to reason about. It also introduces nondeterministic
behavior such as when a socket is rapidly deleted and recreated, which
could fail as the the deletion happens asynchronously.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf says:
====================
rhashtable: Per bucket locks & deferred table resizing
Prepares for and introduces per bucket spinlocks and deferred table
resizing. This allows for parallel table mutations in different hash
buckets from atomic context. The resizing occurs in the background
in a separate worker thread while lookups, inserts, and removals can
continue.
Also modified the chain linked list to be terminated with a special
nulls marker to allow entries to move between multiple lists.
Last but not least, reintroduces lockless netlink_lookup() with
deferred Netlink socket destruction to avoid the side effect of
increased netlink_release() runtime.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Defers the release of the socket reference using call_rcu() to
allow using an RCU read-side protected call to rhashtable_lookup()
This restores behaviour and performance gains as previously
introduced by e341694 ("netlink: Convert netlink_lookup() to use
RCU protected hash table") without the side effect of severely
delayed socket destruction.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to allow for wider usage of rhashtable, use a special nulls
marker to terminate each chain. The reason for not using the existing
nulls_list is that the prev pointer usage would not be valid as entries
can be linked in two different buckets at the same time.
The 4 nulls base bits can be set through the rhashtable_params structure
like this:
struct rhashtable_params params = {
[...]
.nulls_base = (1U << RHT_BASE_SHIFT),
};
This reduces the hash length from 32 bits to 27 bits.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.
The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.
Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.
In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts. Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.
The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.
Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The removal function of nft_hash currently stores a reference to the
previous element during lookup which is used to optimize removal later
on. This was possible because a lock is held throughout calling
rhashtable_lookup() and rhashtable_remove().
With the introdution of deferred table resizing in parallel to lookups
and insertions, the nftables lock will no longer synchronize all
table mutations and the stored pprev may become invalid.
Removing this optimization makes removal slightly more expensive on
average but allows taking the resize cost out of the insert and
remove path.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent patches will require access to the bucket tail. Access
to the tail is relatively cheap as the automatic resizing of the
table should keep the number of entries per bucket to no more
than 0.75 on average.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is in preparation to introduce per bucket spinlocks. It
extends all iterator macros to take the bucket table and bucket
index. It also introduces a new rht_dereference_bucket() to
handle protected accesses to buckets.
It introduces a barrier() to the RCU iterators to the prevent
the compiler from caching the first element.
The lockdep verifier is introduced as stub which always succeeds
and properly implement in the next patch when the locks are
introduced.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hash the key inside of rhashtable_lookup_compare() like
rhashtable_lookup() does. This allows to simplify the hashing
functions and keep them private.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran says:
====================
Fixing the "Time Counter fixes and improvements"
For this series I had only tested the build with ARCH=x86 and arm, but
others like sparc64, microblaze, powerpc, and s390 will fail because
they somehow don't indirectly include clocksource.h for the drivers in
question.
This series fixes the build issues reported by:
kbuild test robot <fengguang.wu@intel.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The timecounter/cyclecounter code has moved, so users need the new include.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver uses the function, clocksource_khz2mult, and so it really must
include clocksource.h.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need for users of the timecounter/cyclecounter code to include
clocksource.h just for a single macro.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
OVS development is moved to netdev mailing list. Update tree and
list in MAINTAINERS file.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* ath9k: enable Transmit Power Control (TPC) for ar9003 chips
* rtlwifi: cleanup and updates from the vendor driver
* rsi: fix memory leak related to firmware image
* ath: parameter fix for FCC DFS pattern
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJUpqFFAAoJEG4XJFUm622brwUH/iLBBtWvqbhFMKDlA9eUG0hD
z+LQkPtNR5gLYk11Qne7H8BjILDWgzdQcbyAUEpCnuplDThCDfj+8JB51gfNGpqU
pv9XwVO2Nf0afh4+hJBkBREI0vAJDod860AG+PV3E5G/WZZyt2MDxF9mk3IbvKVd
APR7cnUxsAltxjr7IWvPFY43wtbRJHbGM8EUVGkXDBaARaPipTJ7GqhwUyv45jCo
LRnW0VQ5njMZZD6DfteB9BiE+2GfZF9Ay4aOzRVJGc83NMHDBQxD7VSHVrBBgCt1
L+Ikz8O0UHG9TsoAGZqEcJ12o2iGWjVFm4TecEYsuhRA1fJmXJOispl/lnudZus=
=ZbEI
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2015-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Changes:
* ath9k: enable Transmit Power Control (TPC) for ar9003 chips
* rtlwifi: cleanup and updates from the vendor driver
* rsi: fix memory leak related to firmware image
* ath: parameter fix for FCC DFS pattern
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes some functions that are not used anywhere:
enic_dev_enable2_done() enic_dev_enable2() enic_dev_deinit_done()
enic_dev_init_prov2() enic_vnic_dev_deinit()
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes some functions that are not used anywhere:
Read_hfc32() Write_hfc32() Write_hfc16()
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the function smt_ifconfig() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes some functions that are not used anywhere:
dbgi_rd_rsp3() dbgi_wr_addr3()
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not needed, only four cases:
- kfree_skb (or one of its aliases).
Don't need to zero, memory will be freed.
- kfree_skb_partial and head was stolen: memory will be freed.
- skb_morph: The skb header fields (including tc ones) will be
copied over from the 'to-be-morphed' skb right after
skb_release_head_state returns.
- skb_segment: Same as before, all the skb header
fields are copied over from the original skb right away.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg say:
====================
pull request: bluetooth-next 2014-12-31
Here's the first batch of bluetooth patches for 3.20.
- Cleanups & fixes to ieee802154 drivers
- Fix synchronization of mgmt commands with respective HCI commands
- Add self-tests for LE pairing crypto functionality
- Remove 'BlueFritz!' specific handling from core using a new quirk flag
- Public address configuration support for ath3012
- Refactor debugfs support into a dedicated file
- Initial support for LE Data Length Extension feature from Bluetooth 4.2
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This results in an approximately 30% increase in throughput
when handling encapsulated bulk traffic.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the only tunnel protocol that supports GRO with encapsulated
Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer
so that it can be used by other tunnel protocols such as GRE and Geneve.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to set the clock speed in read/write which will be performed
unnecessarily for each mdio access. Init it during probe is enough.
Also, the hardcoded clock value is not a proper way for all SoCs.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Which is wrong and not used, so no extra space needed by
mdiobus_alloc_size(), use mdiobus_alloc() instead.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the reset is just clock setting, individual mdio reset is
not available.
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Chen says:
====================
support GMAC driver for RK3288
Roger Chen (6):
patch1: add driver for Rockchip RK3288 SoCs integrated GMAC
patch2: define clock ID used for GMAC
patch3: modify CRU config for Rockchip RK3288 SoCs integrated GMAC
patch4: dts: rockchip: add gmac info for rk3288
patch5: dts: rockchip: enable gmac on RK3288 evb board
patch6: add document for Rockchip RK3288 GMAC
Tested on rk3288 evb board:
Execute the following command to enable ethernet,
set local IP and ping a remote host.
busybox ifconfig eth0 up
busybox ifconfig eth0 192.168.1.111
ping 192.168.1.1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The document descripts how to add properties for GMAC in device tree.
change since v2:
1. remove power-gpio, reset-gpio, phyirq-gpio, pmu_regulator setting
2. add "snps,reset-gpio", "snps,reset-active-low;" "snps,reset-delays-us"
Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
enable gmac in rk3288-evb-rk808.dts
changes since v2:
1. add fixed regulator for PHY
2. remove power-gpio, reset-gpio, phyirq-gpio, pmu_regulator setting
3. add "snps,reset-gpio", "snps,reset-active-low;" "snps,reset-delays-us"
Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add gmac info in rk3288.dtsi for GMAC driver
changes since v2:
1. add drive-strength in the pinctrl settings
Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
modify CRU config for GMAC driver
changes since v2:
1. remove SCLK_MAC_PLL
Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver is based on stmmac driver.
changes since v2:
- use tab instead of space for macros
- use HIWORD_UPDATE macro for GMAC_CLK_RX_DL_CFG and GMAC_CLK_TX_DL_CFG
- remove drive-strength setting in the driver and set it in the pinctrl settings
- use dev_err instead of pr_err
- remove clock names's macros, just use the real name of the clock
- use devm_clk_get() instead of clk_get()
- remove clk_set_parent(bsp_priv->clk_mac, bsp_priv->clk_mac_pll)
- remove gpio setting for LDO, just use regulator API
- remove phy reset using gpio in the glue layer, it has been handled in the stmmac driver
- remove handling phy interrupt (mii interrupt)
changes since v1:
- use BIT() to set register
- combine two remap_write() operations into one for the same register
- use macros for register value setting
- remove grf fail check in rk_gmac_setup() and save all the check in set_rgmii_speed()
- remove .tx_coe=1 in rk_gmac_data
Signed-off-by: Roger Chen <roger.chen@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck says:
====================
fib_trie: Reduce time spent in fib_table_lookup by 35 to 75%
These patches are meant to address several performance issues I have seen
in the fib_trie implementation, and fib_table_lookup specifically. With
these changes in place I have seen a reduction of up to 35 to 75% for the
total time spent in fib_table_lookup depending on the type of search being
performed.
On a VM running in my Corei7-4930K system with a trie of maximum depth of 7
this resulted in a reduction of over 370ns per packet in the total time to
process packets received from an ixgbe interface and route them to a dummy
interface. This represents a failed lookup in the local trie followed by
a successful search in the main trie.
Baseline Refactor
ixgbe->dummy routing 1.20Mpps 2.21Mpps
------------------------------------------------------------
processing time per packet 835ns 453ns
fib_table_lookup 50.1% 418ns 25.0% 113ns
check_leaf.isra.9 7.9% 66ns -- --
ixgbe_clean_rx_irq 5.3% 44ns 9.8% 44ns
ip_route_input_noref 2.9% 25ns 4.6% 21ns
pvclock_clocksource_read 2.6% 21ns 4.6% 21ns
ip_rcv 2.6% 22ns 4.0% 18ns
In the simple case of receiving a frame and dropping it before it can reach
the socket layer I saw a reduction of 40ns per packet. This represents a
trip through the local trie with the correct leaf found with no need for
any backtracing.
Baseline Refactor
ixgbe->local receive 2.65Mpps 2.96Mpps
------------------------------------------------------------
processing time per packet 377ns 337ns
fib_table_lookup 25.1% 95ns 25.8% 87ns
ixgbe_clean_rx_irq 8.7% 33ns 9.0% 30ns
check_leaf.isra.9 7.2% 27ns -- --
ip_rcv 5.7% 21ns 6.5% 22ns
These changes have resulted in several functions being inlined such as
check_leaf and fib_find_node, but due to the code simplification the
overall size of the code has been reduced.
text data bss dec hex filename
16932 376 16 17324 43ac net/ipv4/fib_trie.o - before
15259 376 8 15643 3d1b net/ipv4/fib_trie.o - after
Changes since RFC:
Replaced this_cpu_ptr with correct call to this_cpu_inc in patch 1
Changed test for leaf_info mismatch to (key ^ n->key) & li->mask_plen in patch 10
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds a tracking value for the maximum suffix length of all
prefixes stored in any given tnode. With this value we can determine if we
need to backtrace or not based on if the suffix is greater than the pos
value.
By doing this we can reduce the CPU overhead for lookups in the local table
as many of the prefixes there are 32b long and have a suffix length of 0
meaning we can immediately backtrace to the root node without needing to
test any of the nodes between it and where we ended up.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some reason the compiler doesn't seem to understand that when we are in
a loop that runs from tnode_child_length - 1 to 0 we don't expect the value
of tn->bits to change. As such every call to tnode_get_child was rerunning
tnode_chile_length which ended up consuming quite a bit of space in the
resultant assembly code.
I have gone though and verified that in all cases where tnode_get_child
is used we are either winding though a fixed loop from tnode_child_length -
1 to 0, or are in a fastpath case where we are verifying the value by
either checking for any remaining bits after shifting index by bits and
testing for leaf, or by using tnode_child_length.
size net/ipv4/fib_trie.o
Before:
text data bss dec hex filename
15506 376 8 15890 3e12 net/ipv4/fib_trie.o
After:
text data bss dec hex filename
14827 376 8 15211 3b6b net/ipv4/fib_trie.o
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change pulls the node_set_parent functionality out of put_child_reorg
and instead leaves that to the function to take care of as well. By doing
this we can fully construct the new cluster of tnodes and all of the
pointers out of it before we start routing pointers into it.
I am suspecting this will likely fix some concurency issues though I don't
have a good test to show as such.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change pushes the tnode freeing down into the inflate and halve
functions. It makes more sense here as we have a better grasp of what is
going on and when a given cluster of nodes is ready to be freed.
I believe this may address a bug in the freeing logic as well. For some
reason if the freelist got to a certain size we would call
synchronize_rcu(). I'm assuming that what they meant to do is call
synchronize_rcu() after they had handed off that much memory via
call_rcu(). As such that is what I have updated the behavior to be.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that the assignment of the tnode to the parent is
handled directly within whatever function is currently handling the node be
it inflate, halve, or resize. By doing this we can avoid some of the need
to set NULL pointers in the tree while we are resizing the subnodes.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>