We remove a number of unnecessary variables and branches
in TIPC. This patch is cosmetic and does not change the
operation of TIPC in any way.
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In early versions of TIPC it was possible to administratively block
individual links through the use of the member flag 'blocked'. This
functionality was deemed redundant, and since commit 7368dd ("tipc:
clean out all instances of #if 0'd unused code"), this flag has been
unused.
In the current code, a link only needs to be blocked for sending and
reception if it is subject to an ongoing link failover. In that case,
it is sufficient to check if the number of expected failover packets
is non-zero, something which is done via the funtion 'link_blocked()'.
This commit finally removes the redundant 'blocked' flag completely.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently TIPC supports two L2 media types, Ethernet and Infiniband.
Because both these media are accessed through the common net_device API,
several functions in the two media adaptation files turn out to be
fully or almost identical, leading to unnecessary code duplication.
In this commit we extract this common code from the two media files
and move them to the generic bearer.c. Additionally, we change
the function names to reflect their real role: to access L2 media,
irrespective of type.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, registering a TIPC stack handler in the network device layer
is done twice, once for Ethernet (eth_media) and Infiniband (ib_media)
repectively. But, as this registration is not media specific, we can
avoid some code duplication by moving the registering function to
the generic bearer layer, to the file bearer.c, and call it only once.
The same is true for the network device event notifier.
As a side effect, the two workqueues we are using for for setting up/
cleaning up media can now be eliminated. Furthermore, the array for
storing the specific media type structs, media_array[], can be entirely
deleted.
Note that the eth_started and ib_started flags were removed during the
code relocation. There is now only one call to bearer_setup and
bearer_cleanup, and these can logically not race against each other.
Despite its size, this cleanup work incurs no functional changes in TIPC.
In particular, it should be noted that the sequence ordering of received
packets is unaffected by this change, since packet reception never was
subject to any work queue handling in the first place.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC is currently using the field 'af_packet_priv' in struct net_device
as a handle to find the bearer instance associated to the given network
device. But, by doing so it is blocking other networking cleanups, such
as the one discussed here:
http://patchwork.ozlabs.org/patch/178044/
This commit removes this usage from TIPC. Instead, we introduce a new
field, 'tipc_ptr', to the net_device structure, to serve this purpose.
When TIPC bearer is enabled, the bearer object is associated to
'tipc_ptr'. When a TIPC packet arrives in the recv_msg() upcall
from a networking device, the bearer object can now be obtained from
'tipc_ptr'. When a bearer is disabled, the bearer object is detached
from its underlying network device by setting 'tipc_ptr' to NULL.
Additionally, an RCU lock is used to protect the new pointer.
Henceforth, the existing tipc_net_lock is used in write mode to
serialize write accesses to this pointer, while the new RCU lock is
applied on the read side to ensure that the pointer is 100% valid
within its wrapped area for all readers.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct 'tipc_media' represents the specific info that the media
layer adaptors (eth_media and ib_media) expose to the generic
bearer layer. We clarify this by improved commenting, and by giving
the 'media_list' array the more appropriate name 'media_info_array'.
There are no functional changes in this commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Communication media types are abstracted through the struct 'tipc_media',
one per media type. These structs are allocated statically inside their
respective media file.
Furthermore, in order to be able to reach all instances from a central
location, we keep a static array with pointers to these structs. This
array is currently initialized at runtime, under protection of
tipc_net_lock. However, since the contents of the array itself never
changes after initialization, we can just as well initialize it at
compile time and make it 'const', at the same time making it obvious
that no lock protection is needed here.
This commit makes the array constant and removes the redundant lock
protection.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_buff lists are currently relased by looping over the list and
explicitly releasing each buffer.
We replace all occurrences of this loop with a call to kfree_skb_list().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'handler_enabled' is a global flag indicating whether the TIPC
signal handling service is enabled or not. The lack of lock
protection for this flag incurs a risk for contention, so that
a tipc_k_signal() call might queue a signal handler to a destroyed
signal queue, with unpredictable results. To correct this, we let
the already existing 'qitem_lock' protect the flag, as it already
does with the queue itself. This way, we ensure that the flag
always is consistent across all cores.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'signal handler' service in TIPC is a mechanism that makes it
possible to postpone execution of functions, by launcing them into
a job queue for execution in a separate tasklet, independent of
the launching execution thread.
When we do rmmod on the tipc module, this service is stopped after
the network service. At the same time, the stopping of the network
service may itself launch jobs for execution, with the risk that these
functions may be scheduled for execution after the data structures
meant to be accessed by the job have already been deleted. We have
seen this happen, most often resulting in an oops.
This commit ensures that the signal handler is the very first to be
stopped when TIPC is shut down, so there are no surprises during
the cleanup of the other services.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct 'tipc_bearer' is a generic representation of the underlying
media type, and exists in a one-to-one relationship to each interface
TIPC is using. The struct contains a 'blocked' flag that mirrors the
operational and execution state of the represented interface, and is
updated through notification calls from the latter. The users of
tipc_bearer are checking this flag before each attempt to send a
packet via the interface.
This state mirroring serves no purpose in the current code base. TIPC
links will not discover a media failure any faster through this
mechanism, and in reality the flag only adds overhead at packet
sending and reception.
Furthermore, the fact that the flag needs to be protected by a spinlock
aggregated into tipc_bearer has turned out to cause a serious and
completely unnecessary deadlock problem.
CPU0 CPU1
---- ----
Time 0: bearer_disable() link_timeout()
Time 1: spin_lock_bh(&b_ptr->lock) tipc_link_push_queue()
Time 2: tipc_link_delete() tipc_bearer_blocked(b_ptr)
Time 3: k_cancel_timer(&req->timer) spin_lock_bh(&b_ptr->lock)
Time 4: del_timer_sync(&req->timer)
I.e., del_timer_sync() on CPU0 never returns, because the timer handler
on CPU1 is waiting for the bearer lock.
We eliminate the 'blocked' flag from struct tipc_bearer, along with all
tests on this flag. This not only resolves the deadlock, but also
simplifies and speeds up the data path execution of TIPC. It also fits
well into our ongoing effort to make the locking policy simpler and
more manageable.
An effect of this change is that we can get rid of functions such as
tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
We replace the latter with a new function, tipc_reset_bearer(), which
resets all links associated to the bearer immediately after an
interface goes down.
A user might notice one slight change in link behaviour after this
change. When an interface goes down, (e.g. through a NETDEV_DOWN
event) all attached links will be reset immediately, instead of
leaving it to each link to detect the failure through a timer-driven
mechanism. We consider this an improvement, and see no obvious risks
with the new behavior.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <Paul.Gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.
This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.
Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.
Also document these changes in include/linux/net.h as suggested by David
Miller.
Changes since RFC:
Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.
With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
msg->msg_name = NULL
".
This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.
Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.
Cc: David Miller <davem@davemloft.net>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.
The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the following Smatch warning:
net/tipc/link.c:2364 tipc_link_recv_fragment()
warn: variable dereferenced before check '*head' (see line 2361)
A null pointer might be passed to skb_try_coalesce if
a malicious sender injects orphan fragments on a link.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If appending a received fragment to the pending fragment chain
in a unicast link fails, the current code tries to force a retransmission
of the fragment by decrementing the 'next received sequence number'
field in the link. This is done under the assumption that the failure
is caused by an out-of-memory situation, an assumption that does
not hold true after the previous patch in this series.
A failure to append a fragment can now only be caused by a protocol
violation by the sending peer, and it must hence be assumed that it
is either malicious or buggy. Either way, the correct behavior is now
to reset the link instead of trying to revert its sequence number.
So, this is what we do in this commit.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the first fragment of a long data data message is received on a link, a
reassembly buffer large enough to hold the data from this and all subsequent
fragments of the message is allocated. The payload of each new fragment is
copied into this buffer upon arrival. When the last fragment is received, the
reassembled message is delivered upwards to the port/socket layer.
Not only is this an inefficient approach, but it may also cause bursts of
reassembly failures in low memory situations. since we may fail to allocate
the necessary large buffer in the first place. Furthermore, after 100 subsequent
such failures the link will be reset, something that in reality aggravates the
situation.
To remedy this problem, this patch introduces a different approach. Instead of
allocating a big reassembly buffer, we now append the arriving fragments
to a reassembly chain on the link, and deliver the whole chain up to the
socket layer once the last fragment has been received. This is safe because
the retransmission layer of a TIPC link always delivers packets in strict
uninterrupted order, to the reassembly layer as to all other upper layers.
Hence there can never be more than one fragment chain pending reassembly at
any given time in a link, and we can trust (but still verify) that the
fragments will be chained up in the correct order.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a message fragment is received in a broadcast or unicast link,
the reception code will append the fragment payload to a big reassembly
buffer through a call to the function tipc_recv_fragm(). However, after
the return of that call, the logics goes on and passes the fragment
buffer to the function tipc_net_route_msg(), which will simply drop it.
This behavior is a remnant from the now obsolete multi-cluster
functionality, and has no relevance in the current code base.
Although currently harmless, this unnecessary call would be fatal
after applying the next patch in this series, which introduces
a completely new reassembly algorithm. So we change the code to
eliminate the redundant call.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The message dispatching part of tipc_recv_msg() is wrapped layers of
while/if/if/switch, causing out-of-control indentation and does not
look very good. We reduce two indentation levels by separating the
message dispatching from the blocks that checks link state and
sequence numbers, allowing longer function and arg names to be
consistently indented without wrapping. Additionally we also rename
"cont" label to "discard" and add one new label called "unlock_discard"
to make code clearer. In all, these are cosmetic changes that do not
alter the operation of TIPC in any way.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Andreas Bofjäll <andreas.bofjall@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When checking statistics or changing parameters on a link, the
link_find_link function is used to locate the link with a given
name. The complex method of deconstructing the name into local
and remote address/interface is error prone and may fail if the
interface names contains special characters. We change the lookup
method to iterate over the list of nodes and compare the link
names.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
link_cmd_set_value() takes commands for link, bearer and media related
configuration. Genereally the function returns 0 when a command is
recognized, and -EINVAL when it is not. However, in the switch for link
related commands it returns 0 even when the command is unrecognized. This
will sometimes make it look as if a failed configuration command has been
successful, but has otherwise no negative effects.
We remove this anomaly by returning -EINVAL even for link commands. We also
rework all three switches to make them conforming to common kernel coding
style.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, rcv_msg() always returns zero on a packet delivery upcall
from net_device.
To make its behavior more compliant with the way this API should be
used, we change this to let it return NET_RX_SUCCESS (which is zero
anyway) when it is able to handle the packet, and NET_RX_DROP otherwise.
The latter does not imply any functional change, it only enables the
driver to keep more accurate statistics about the fate of delivered
packets.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tipc_block_bearer() currently takes a bearer name (const char*)
as argument. This requires the function to make a lookup to find
the pointer to the corresponding bearer struct. In the current
code base this is not necessary, since the only two callers
(tipc_continue(),recv_notification()) already have validated
copies of this pointer, and hence can pass it directly in the
function call.
We change tipc_block_bearer() to directly take struct tipc_bearer*
as argument instead.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC 'bearer' exists as an abstract concept, while 'media'
is deemed a specific implementation of a bearer, such as Ethernet
or Infiniband media. When a component inside TIPC wants to control
a specific media, it only needs to access the generic bearer API
to achieve this. However, in the current media implementations,
the 'bearer' name is also extensively used in media specific
function and variable names.
This may create confusion, so we choose to replace the term 'bearer'
with 'media' in all function names, variable names, and prefixes
where this is what really is meant.
Note that this change is cosmetic only, and no runtime behaviour
changes are made here.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tipc_msg_build() now copies message data from iovec to skb_buff
using memcpy_fromiovecend(), which doesn't need to be passed the
iovec length to perform the copying.
So we remove the parameter indicating iovec length in all
functions where TIPC messages are built and sent.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tipc_msg_build() calls skb_copy_to_linear_data_offset() to copy data
from user space to kernel space. However, the latter function does
in its turn call memcpy() to perform the actual copying. This poses
an obvious security and robustness risk, since memcpy() never makes
any validity check on the pointer it is copying from.
To correct this, we the replace the offending function call with
a call to memcpy_fromiovecend(), which uses copy_from_user() to
perform the copying.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should a connect fail, if the publication/server is unavailable or
due to some other error, a positive value will be returned and errno
is never set. If the application code checks for an explicit zero
return from connect (success) or a negative return (failure), it
will not catch the error and subsequent send() calls will fail as
shown from the strace snippet below.
socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3
connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111
sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe)
The reason for this behaviour is that TIPC wrongly inverts error
codes set in sk_err.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of passing each byte by stack let's use nice specifier for that.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert enable_bearer() to RCU locking with dev_get_by_name().
Based on a similar changeset in commit 840a185d ["aoe: remove
dev_base_lock use from aoecmd_cfg_pkts()"] -- quoting that:
"dev_base_lock is the legacy way to lock the device list,
and is planned to disappear. (writers hold RTNL, readers
hold RCU lock)"
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When skb buffer cannot be allocated in link_send_sections_long(),
-ENOMEM error code instead of -EFAULT should be returned to its
caller.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once message build request function returns invalid code, the
process of sending message cannot continue. So in case of message
build failure, tipc_link_send_sections_fast() should return
immediately.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pfifo_fast is set as default traffic class queueing discipline. This
queue has three so called "bands". Within each band, FIFO rules apply.
However, as long as there are packets waiting in band 0, band 1 won't
be processed.
Now all kind of TIPC type packet priorities are never set, that is,
their priorities are 0, so they are mapped to band 1 of pfifo_fast
qdisc. But, especially during link congestion, if link protocol packet
can be sent out as earlier as possible than other type of packets so
that protocol packet can arrive at peer endpoint in time, the peer
will timely reset its link timeout timer to keep the link alive.
So enhancing the priority of link protocol packets can meet the
specific demand to avoid unnecessary link reset due to a transient
link congestion.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No runtime code changes here. Just a realign of the function
arguments to start where the 1st one was, and fit as many args
as can be put in an 80 char line.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Directly save sock structure pointer instead of void pointer to avoid
unnecessary cast conversions.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the configuration server is now running under process context,
it's unnecessary for us to have a spinlock serializing the TIPC
configuration process. Instead, we replace it with a mutex lock,
which gives us more freedom. For instance, we can now call
pre-emptable functions within the protected area.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the removal of the native API, there is now only one way to
to create a TIPC port instance -- the function tipc_createport_raw().
We make it more readable by renaming it to tipc_createport().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the native API has been completely removed, the 'user_port'
field in struct tipc_port becomes unused, and can be removed.
As a consequence, the "usrmem" argument in tipc_msg_build() is no
longer needed, and so we remove that one too.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having completed the conversion of the topology server and
configuration server to use the new server infrastructure,
the following functions become unused, and can be deleted:
- tipc_createport()
- port_wakeup_sh()
- port_dispatcher()
- port_dispatcher_sigh()
- tipc_send_buf_fast()
- tipc_send_buf2port
Additionally, the following variables become orphaned,
and can be deleted:
- tipc_msg_err_event
- tipc_named_msg_err_event
- tipc_conn_shutdown_event
- tipc_msg_event
- tipc_named_msg_event
- tipc_conn_msg_event
- tipc_continue_event
- msg_queue_head
- msg_queue_tail
- queue_lock
Deletion is done here in a separate commit in order to allow
the actual conversion changes to be more easily viewed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the new socket-based TIPC server infrastructure has been
introduced, we can now convert the configuration server to use
it. Then we can take future steps to simplify the configuration
server locking policy.
Some minor reordering of initialization is done, due to the
dependency on having tipc_socket_init completed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the new TIPC server infrastructure has been introduced, we can
now convert the TIPC topology server to it. We get two benefits
from doing this:
1) It simplifies the topology server locking policy. In the
original locking policy, we placed one spin lock pointer in the
tipc_subscriber structure to reuse the lock of the subscriber's
server port, controlling access to members of tipc_subscriber
instance. That is, we only used one lock to ensure both
tipc_port and tipc_subscriber members were safely accessed.
Now we introduce another spin lock for tipc_subscriber structure
only protecting themselves, to get a finer granularity locking
policy. Moreover, the change will allow us to make the topology
server code more readable and maintainable.
2) It fixes a bug where sent subscription events may be lost when
the topology port is congested. Using the new service, the
topology server now queues sent events into an outgoing buffer,
and then wakes up a sender process which has been blocked in
workqueue context. The process will keep picking events from the
buffer and send them to their respective subscribers, using the
kernel socket interface, until the buffer is empty. Even if the
socket is congested during transmission there is no risk that
events may be dropped, since the sender process may block when
needed.
Some minor reordering of initialization is done, since we now
have a scenario where the topology server must be started after
socket initialization has taken place, as the former depends
on the latter. And overall, we see a simplification of the
TIPC subscriber code in making this changeover.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.
As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.
To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.
The new service also solves another problem:
As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.
Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.
As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.
As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC's implied connect feature, aka piggyback connect, allows
applications to save one syscall and all SYN/SYN-ACK signalling
overhead when setting up a connection. Until now, this has only
been supported for SEQPACKET sockets. Here, we make it possible
to use this feature even with stream sockets.
At the connecting side, the connection is completed when the
first data message arrives from the accepting peer. This means
that we must allow the connecting user to call blocking recv()
before the socket has reached state SS_CONNECTED. So we must must
relax the state machine check at recv_stream(), and allow the
recv() call even if socket is in state SS_CONNECTING.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per feedback from the netdev community, we change the buffer
overflow protection algorithm in receiving sockets so that it
always respects the nominal upper limit set in sk_rcvbuf.
Instead of scaling up from a small sk_rcvbuf value, which leads to
violation of the configured sk_rcvbuf limit, we now calculate the
weighted per-message limit by scaling down from a much bigger value,
still in the same field, according to the importance priority of the
received message.
To allow for administrative tunability of the socket receive buffer
size, we create a tipc_rmem sysctl variable to allow the user to
configure an even bigger value via sysctl command. It is a size of
three (min/default/max) to be consistent with things like tcp_rmem.
By default, the value initialized in tipc_rmem[1] is equal to the
receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
This value is also set as the default value of sk_rcvbuf.
Originally-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
[Ying: added sysctl variation to Jon's original patch]
Signed-off-by: Ying Xue <ying.xue@windriver.com>
[PG: don't compile sysctl.c if not config'd; add Documentation]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
The worry here is that fragm_sz could be zero since it comes from
skb->data.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bearer_id here comes from skb->data and it can be a number from 0 to
7. The problem is that the ->links[] array has only 2 elements so I
have added a range check.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending packets, TIPC bearers use skb_clone() before writing their
hardware header. This will however NOT copy the data buffer.
So when the same packet is sent over multiple bearers (to reach multiple
nodes), the same socket buffer data will be treated by multiple
tipc_media drivers which will write their own hardware header through
dev_hard_header().
Most of the time this is not a problem, because by the time the
packet is processed by the second media, it has already been sent over
the first one. However, when the first transmission is delayed (e.g.
because of insufficient bandwidth or through a shaper), the next bearer
will overwrite the hardware header, resulting in the packet being sent:
a) with the wrong source address, when bearers of the same type,
e.g. ethernet, are involved
b) with a completely corrupt header, or even dropped, when bearers of
different types are involved.
So when the same socket buffer is to be sent multiple times, send a
pskb_copy() instead (from the second instance on), and release it
afterwards (the bearer will skb_clone() it anyway).
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add InfiniBand media type based on the ethernet media type.
The only real difference is that in case of InfiniBand, we need the entire
20 bytes of space reserved for media addresses, so the TIPC media type ID is
not explicitly stored in the packet payload.
Sample output of tipc-config:
# tipc-config -v -addr -netid -nt=all -p -m -b -n -ls
node address: <10.1.4>
current network id: 4711
Type Lower Upper Port Identity Publication Scope
0 167776257 167776257 <10.1.1:1855512577> 1855512578 cluster
167776260 167776260 <10.1.4:1216454657> 1216454658 zone
1 1 1 <10.1.4:1216479235> 1216479236 node
Ports:
1216479235: bound to {1,1}
1216454657: bound to {0,167776260}
Media:
eth
ib
Bearers:
ib:ib0
Nodes known:
<10.1.1>: up
Link <broadcast-link>
Window:20 packets
RX packets:0 fragments:0/0 bundles:0/0
TX packets:0 fragments:0/0 bundles:0/0
RX naks:0 defs:0 dups:0
TX naks:0 acks:0 dups:0
Congestion bearer:0 link:0 Send queue max:0 avg:0
Link <10.1.4:ib0-10.1.1:ib0>
ACTIVE MTU:2044 Priority:10 Tolerance:1500 ms Window:50 packets
RX packets:80 fragments:0/0 bundles:0/0
TX packets:40 fragments:0/0 bundles:0/0
TX profile sample:22 packets average:54 octets
0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
RX states:410 probes:213 naks:0 defs:0 dups:0
TX states:410 probes:197 naks:0 acks:0 dups:0
Congestion bearer:0 link:0 Send queue max:1 avg:0
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb->protocol field is used by packet classifiers and for AF_PACKET
cooked format, TIPC needs to set it properly.
Fixes packet classification and ethertype of 0x0000 in cooked captures:
Out 20:c9:d0:43:12:d9 ethertype Unknown (0x0000), length 56:
0x0000: 5b50 0028 0000 30d4 0100 1000 0100 1001 [P.(..0.........
0x0010: 0000 03e8 0000 0001 20c9 d043 12d9 0000 ...........C....
0x0020: 0000 0000 0000 0000 ........
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some network protocols, like InfiniBand, don't have a fixed broadcast
address but one that depends on the configuration. Move the bcast_addr
to struct tipc_bearer and initialize it with the broadcast address of
the network device when the bearer is enabled.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/nfc/microread/mei.c
net/netfilter/nfnetlink_queue_core.c
Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
some cleanups are going to go on-top.
Signed-off-by: David S. Miller <davem@davemloft.net>
The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.
Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.
Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers all
over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
If you need me to provide a merged tree to handle these resolutions,
please let me know.
Other than those patches, there's not much here, some minor fixes and
updates.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
=yWAQ
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
Other than those patches, there's not much here, some minor fixes and
updates"
Fix up trivial conflicts
* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...
Pull in 'net' to take in the bug fixes that didn't make it into
3.8-final.
Also, deal with the semantic conflict of the change made to
net/ipv6/xfrm6_policy.c A missing rt6->n neighbour release
was added to 'net', but in 'net-next' we no longer cache the
neighbour entries in the ipv6 routes so that change is not
appropriate there.
Signed-off-by: David S. Miller <davem@davemloft.net>
As the number of iovecs in a send request is already limited within
UIO_MAXIOV(i.e. 1024) in __sys_sendmsg(), it's unnecessary to check it
again in TIPC stack.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Change overload control to be purely byte-based, using
sk->sk_rmem_alloc as byte counter, and compare it to a calculated
upper limit for the socket receive queue.
For all connection messages, irrespective of message importance,
the overload limit is set to a constant value (i.e, 67MB). This
limit should normally never be reached because of the lower
limit used by the flow control algorithm, and is there only
as a last resort in case a faulty peer doesn't respect the send
window limit.
For datagram messages, message importance is taken into account
when calculating the overload limit. The calculation is based
on sk->sk_rcvbuf, and is hence configurable via the socket option
SO_RCVBUF.
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The tipc function discard_rx_queue() is just a duplicated
implementation of __skb_queue_purge(). Remove the former
and directly invoke __skb_queue_purge().
In doing so, the underscores convey to the code reader, more
information about the current locking state that is assumed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
After commit 3c294cb3 "tipc: remove the bearer congestion mechanism",
we try to grab the broadcast bearer lock when sending multicast
messages over the broadcast link. This will cause an oops because
the lock is never initialized. This is an old bug, but the lock
was never actually used before commit 3c294cb3, so that why it was
not visible until now. The oops will look something like:
BUG: spinlock bad magic on CPU#2, daemon/147
lock: bcast_bearer+0x48/0xffffffffffffd19a [tipc],
.magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
Pid: 147, comm: daemon Not tainted 3.8.0-rc3+ #206
Call Trace:
spin_dump+0x8a/0x8f
spin_bug+0x21/0x26
do_raw_spin_lock+0x114/0x150
_raw_spin_lock_bh+0x19/0x20
tipc_bearer_blocked+0x1f/0x40 [tipc]
tipc_link_send_buf+0x82/0x280 [tipc]
? __alloc_skb+0x9f/0x2b0
tipc_bclink_send_msg+0x77/0xa0 [tipc]
tipc_multicast+0x11b/0x1b0 [tipc]
send_msg+0x225/0x530 [tipc]
sock_sendmsg+0xca/0xe0
The above can be triggered by running the multicast demo program.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
CC: Jon Maloy <jon.maloy@ericsson.com>
CC: Allan Stephens <allan.stephens@windriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David S. Miller <davem@davemloft.net>
In TIPC's accept() routine, there is a large block of code relating
to initialization of a new socket, all within an if condition checking
if the allocation succeeded.
Here, we simply flip the check of the if, so that the main execution
path stays at the same indentation level, which improves readability.
If the allocation fails, we jump to an already existing exit label.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
TIPC accept() call grabs the socket lock on a newly allocated
socket while holding the socket lock on an old socket. But lockdep
worries that this might be a recursive lock attempt:
[ INFO: possible recursive locking detected ]
---------------------------------------------
kworker/u:0/6 is trying to acquire lock:
(sk_lock-AF_TIPC){+.+.+.}, at: [<c8c1226c>] accept+0x15c/0x310 [tipc]
but task is already holding lock:
(sk_lock-AF_TIPC){+.+.+.}, at: [<c8c12138>] accept+0x28/0x310 [tipc]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(sk_lock-AF_TIPC);
lock(sk_lock-AF_TIPC);
*** DEADLOCK ***
May be due to missing lock nesting notation
[...]
Tell lockdep that this locking is safe by using lock_sock_nested().
This is similar to what was done in commit 5131a184a3 for
SCTP code ("SCTP: lock_sock_nested in sctp_sock_migrate").
Also note that this is isn't something that is seen normally,
as it was uncovered with some experimental work-in-progress
code not yet ready for mainline. So no need for stable
backports or similar of this commit.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
As connection setup is now completed asynchronously in BH context,
in the function filter_connect(), the corresponding code in recv_msg()
becomes redundant.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.
With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.
The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.
It is also now possible to call select() or poll() to wait for the
completion of a connection.
An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Handling of connection-related message reception is currently scattered
around at different places in the code. This makes it harder to verify
that things are handled correctly in all possible scenarios.
So we consolidate the existing processing of connection-oriented
message reception in a single routine. In the process, we convert the
chain of if/else into a switch/case for improved readability.
A cast on the socket_state in the switch is needed to avoid compile
warnings on 32 bit, like "net/tipc/socket.c:1252:2: warning: case value
‘4294967295’ not in enumerated type". This happens because existing
tipc code pseudo extends the default linux socket state values with:
#define SS_LISTENING -1 /* socket is listening */
#define SS_READY -2 /* socket is connectionless */
It may make sense to add these as _positive_ values to the existing
socket state enum list someday, vs. these already existing defines.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: add cast to fix warning; remove returns from middle of switch]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Currently we have tipc_disconnect and tipc_disconnect_port. It is
not clear from the names alone, what they do or how they differ.
It turns out that tipc_disconnect just deals with the port locking
and then calls tipc_disconnect_port which does all the work.
If we rename as follows: tipc_disconnect_port --> __tipc_disconnect
then we will be following typical linux convention, where:
__tipc_disconnect: "raw" function that does all the work.
tipc_disconnect: wrapper that deals with locking and then calls
the real core __tipc_disconnect function
With this, the difference is immediately evident, and locking
violations are more apt to be spotted by chance while working on,
or even just while reading the code.
On the connect side of things, we currently only have the single
"tipc_connect2port" function. It does both the locking at enter/exit,
and the core of the work. Pending changes will make it desireable to
have the connect be a two part locking wrapper + worker function,
just like the disconnect is already.
Here, we make the connect look just like the updated disconnect case,
for the above reason, and for consistency. In the process, we also
get rid of the "2port" suffix that was on the original name, since
it adds no descriptive value.
On close examination, one might notice that the above connect
changes implicitly move the call to tipc_link_get_max_pkt() to be
within the scope of tipc_port_lock() protected region; when it was
not previously. We don't see any issues with this, and it is in
keeping with __tipc_connect doing the work and tipc_connect just
handling the locking.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The sk_recv_queue upper limit for connectionless sockets has empirically
turned out to be too low. When we double the current limit we get much
fewer rejected messages and no noticable negative side-effects.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.
This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.
Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.
As a result, the atomic variable tipc_queue_size is now unused
and so it can be deleted. There is a getsockopt call that used
to allow reading it; we retain that but just return zero for
maximum compatibility.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
[PG: phase out tipc_queue_size as pointed out by Neil Horman]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Each link instance has a periodic job checking if there is a stale
ongoing message reassembly associated to the link. If no new
fragment has been received during the last 4*[link_tolerance] period,
it is assumed the missing fragment will never arrive. As a consequence,
the reassembly buffer is discarded, and a gap in the message sequence
occurs.
This assumption is wrong. After we abandoned our ambition to develop
packet routing for multi-cluster networks, only single-hop packet
transfer remains as an option. For those, all packets are guaranteed
to be delivered in sequence to the defragmentation layer. Any failure
to achieve sequenced delivery will eventually lead to link reset, and
the reassembly buffer will be flushed anyway.
So we just remove this periodic check, which is now obsolete.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: also delete get/inc_timer count, since they are now unused]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
There used to be a time when TIPC had lots of Kconfig knobs the
end user could alter, but they have all been made automatic or
obsolete, with the exception of CONFIG_TIPC_PORTS. This
previously existing set of options was all hidden under the
TIPC_ADVANCED setting, which does not exist in any code, but
only in Kconfig scope.
Having this now, just to hide the one remaining "advanced"
option no longer makes sense. Remove it. Also get rid of the
ifdeffery in the TIPC code that allowed for TIPC_PORTS to be
possibly undefined.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
As the variable:node is currently defined to u32 type, it is
unnecessary to cast its type to u32 again when using it.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Upon establishing a first link between two nodes, there is
currently a risk that the two endpoints will disagree on exactly
which sequence number reception and acknowleding of broadcast
packets should start.
The following scenarios may happen:
1: Node A sends an ACTIVATE message to B, telling it to start acking
packets from sequence number N.
2: Node A sends out broadcast N, but does not expect an acknowledge
from B, since B is not yet in its broadcast receiver's list.
3: Node A receives ACK for N from all nodes except B, and releases
packet N.
4: Node B receives the ACTIVATE, activates its link endpoint, and
stores the value N as sequence number of first expected packet.
5: Node B sends a NAME_DISTR message to A.
6: Node A receives the NAME_DISTR message, and activates its endpoint.
At this moment B is added to A's broadcast receiver's set.
Node A also sets sequence number 0 as the first broadcast packet
to be received from B.
7: Node A sends broadcast N+1.
8: B receives N+1, determines there is a gap in the sequence, since
it is expecting N, and sends a NACK for N back to A.
9: Node A has already released N, so no retransmission is possible.
The broadcast link in direction A->B is stale.
In addition to, or instead of, 7-9 above, the following may happen:
10: Node B sends broadcast M > 0 to A.
11: Node A receives M, falsely decides there must be a gap, since
it is expecting packet 0, and asks for retransmission of packets
[0,M-1].
12: Node B has already released these packets, so the broadcast
link is stale in direction B->A.
We solve this problem by introducing a new unicast message type,
BCAST_PROTOCOL/STATE, to convey the sequence number of the next
sent broadcast packet to the other endpoint, at exactly the moment
that endpoint is added to the own node's broadcast receivers list,
and before any other unicast messages are permitted to be sent.
Furthermore, we don't allow any node to start receiving and
processing broadcast packets until this new synchronization
message has been received.
To maintain backwards compatibility, we still open up for
broadcast reception if we receive a NAME_DISTR message without
any preceding broadcast sync message. In this case, we must
assume that the other end has an older code version, and will
never send out the new synchronization message. Hence, for mixed
old and new nodes, the issue arising in 7-12 of the above may
happen with the same probability as before.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Rename the "supported" flag in bclink structure to "recv_permitted"
to better reflect what it is used for. When this flag is set for a
given node, we are permitted to receive and acknowledge broadcast
messages from that node. Convert it to a bool at the same time,
since it is not used to store any numerical values.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The "supportable" flag in bclink structure is a compatibility flag
indicating whether a peer node is capable of receiving TIPC broadcast
messages. However, all TIPC versions since tipc-1.5, and after the
inclusion in the upstream Linux kernel in 2006, support this capability.
It is highly unlikely that anybody is still using such an old
version of TIPC, let alone that they want to mix it with TIPC-2.0
nodes. Therefore, we now remove the "supportable" flag.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Currently at the TIPC bearer layer there is the following congestion
mechanism:
Once sending packets has failed via that bearer, the bearer will be
flagged as being in congested state at once. During bearer congestion,
all packets arriving at link will be queued on the link's outgoing
buffer. When we detect that the state of bearer congestion has
relaxed (e.g. some packets are received from the bearer) we will try
our best to push all packets in the link's outgoing buffer until the
buffer is empty, or until the bearer is congested again.
However, in fact the TIPC bearer never receives any feedback from the
device layer whether a send was successful or not, so it must always
assume it was successful. Therefore, the bearer congestion mechanism
as it exists currently is of no value.
But the bearer blocking state is still useful for us. For example,
when the physical media goes down/up, we need to change the state of
the links bound to the bearer. So the code maintaing the state
information is not removed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When a socket is shut down, we should wake up all thread sleeping on
it, instead of just one of them. Otherwise, when several threads are
polling the same socket, and one of them does shutdown(), the
remaining threads may end up sleeping forever.
Also, to align socket usage with common practice in other stacks, we
use one of the common socket callback handlers, sk_state_change(),
to wake up pending users. This is similar to the usage in e.g.
inet_shutdown(). [net/ipv4/af_inet.c].
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
If an implied connect is attempted on a nonblocking STREAM/SEQPACKET
socket during link congestion, the connect message will be discarded
and sendmsg will return EAGAIN. This is normal behavior, and the
application is expected to poll the socket until POLLOUT is set,
after which the connection attempt can be retried.
However, the POLLOUT flag is never set for unconnected sockets and
poll() always returns a zero mask. The application is then left without
a trigger for when it can make another attempt at sending the message.
The solution is to check if we're polling on an unconnected socket
and set the POLLOUT flag if the TIPC port owned by this socket
is not congested. The TIPC ports waiting on a specific link will be
marked as 'not congested' when the link congestion have abated.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
When an application blocks at poll/select on a TIPC socket
while requesting a specific event mask, both the filter_rcv() and
wakeupdispatch() case will wake it up unconditionally whenever
the state changes (i.e an incoming message arrives, or congestion
has subsided). No mask is used.
To avoid this, we populate sk->sk_data_ready and sk->sk_write_space
with tipc_data_ready and tipc_write_space respectively, which makes
tipc more in alignment with the rest of the networking code. These
pass the exact set of possible events to the waker in fs/select.c
hence avoiding waking up blocked processes unnecessarily.
In doing so, we uncover another issue -- that there needs to be a
memory barrier in these poll/receive callbacks, otherwise we are
subject to the the same race as documented above wq_has_sleeper()
[in commit a57de0b4 "net: adding memory barrier to the poll and
receive callbacks"]. So we need to replace poll_wait() with
sock_poll_wait() and use rcu protection for the sk->sk_wq pointer
in these two new functions.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
If tasklet_disable() is called before related tasklet handled,
tasklet_kill will never be finished. tasklet_kill is enough.
Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: tipc-discussion@lists.sourceforge.net
Signed-off-by: David S. Miller <davem@davemloft.net>
When large buffers are sent over connected TIPC sockets, it
is likely that the sk_backlog will be filled up on the
receiver side, but the TIPC flow control mechanism is happily
unaware of this since that is based on message count.
The sender will receive a TIPC_ERR_OVERLOAD message when this occurs
and drop it's side of the connection, leaving it stale on
the receiver end.
By increasing the sk_rcvbuf to a 'worst case' value, we avoid the
overload caused by a full backlog queue and the flow control
will work properly.
This worst case value is the max TIPC message size times
the flow control window, multiplied by two because a sender
will transmit up to double the window size before a port is marked
congested.
We multiply this by 2 to account for the sk_buff and other overheads.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.
I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.
I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gets rid of the need for users to specify the maximum number of
name publications supported by TIPC. TIPC now automatically provides
support for the maximum number of name publications to 65535.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gets rid of the need for users to specify the maximum number of
name subscriptions supported by TIPC. TIPC now automatically provides
support for the maximum number of name subscriptions to 65535.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added to the following:
- tipc_random
- tipc_own_addr
- tipc_max_ports
- tipc_net_id
- tipc_remote_management
- handler_enabled
The above global variables are read often, but written rarely. Use
__read_mostly to prevent them being on the same cacheline as another
variable which is written to often, which would cause cacheline
bouncing.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is nothing changing this variable dynamically, so change
it to a macro to make that more obvious when reading the code.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since now tipc_net_start() always returns a success code - 0, its
return value type should be changed from integer to void, which can
avoid unnecessary check for its return value.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After eliminating the mechanism which checks whether all letters
in media name string are within a given character set, the
media_name_valid routine becomes trivial. It is also only
used once, so it is unnecessary to keep it as a separate function.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no real reason to check whether all letters in the given
media name and network interface name are within the character set
defined in tipc_alphabet array. Even if we eliminate the checking,
the rest of checking conditions in tipc_enable_bearer() can ensure
we do not enable an invalid or illegal bearer.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the lockdep validator is enabled, it will report the below
warning when we enable a TIPC bearer:
[ INFO: possible irq lock inversion dependency detected ]
---------------------------------------------------------
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(ptype_lock);
local_irq_disable();
lock(tipc_net_lock);
lock(ptype_lock);
<Interrupt>
lock(tipc_net_lock);
*** DEADLOCK ***
the shortest dependencies between 2nd lock and 1st lock:
-> (ptype_lock){+.+...} ops: 10 {
[...]
SOFTIRQ-ON-W at:
[<c1089418>] __lock_acquire+0x528/0x13e0
[<c108a360>] lock_acquire+0x90/0x100
[<c1553c38>] _raw_spin_lock+0x38/0x50
[<c14651ca>] dev_add_pack+0x3a/0x60
[<c182da75>] arp_init+0x1a/0x48
[<c182dce5>] inet_init+0x181/0x27e
[<c1001114>] do_one_initcall+0x34/0x170
[<c17f7329>] kernel_init+0x110/0x1b2
[<c155b6a2>] kernel_thread_helper+0x6/0x10
[...]
... key at: [<c17e4b10>] ptype_lock+0x10/0x20
... acquired at:
[<c108a360>] lock_acquire+0x90/0x100
[<c1553c38>] _raw_spin_lock+0x38/0x50
[<c14651ca>] dev_add_pack+0x3a/0x60
[<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc]
[<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc]
[<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
[<c8bbc032>] handle_cmd+0x42/0xd0 [tipc]
[<c148e802>] genl_rcv_msg+0x232/0x280
[<c148d3f6>] netlink_rcv_skb+0x86/0xb0
[<c148e5bc>] genl_rcv+0x1c/0x30
[<c148d144>] netlink_unicast+0x174/0x1f0
[<c148ddab>] netlink_sendmsg+0x1eb/0x2d0
[<c1456bc1>] sock_aio_write+0x161/0x170
[<c1135a7c>] do_sync_write+0xac/0xf0
[<c11360f6>] vfs_write+0x156/0x170
[<c11361e2>] sys_write+0x42/0x70
[<c155b0df>] sysenter_do_call+0x12/0x38
[...]
}
-> (tipc_net_lock){+..-..} ops: 4 {
[...]
IN-SOFTIRQ-R at:
[<c108953a>] __lock_acquire+0x64a/0x13e0
[<c108a360>] lock_acquire+0x90/0x100
[<c15541cd>] _raw_read_lock_bh+0x3d/0x50
[<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc]
[<c8bc195f>] recv_msg+0x3f/0x50 [tipc]
[<c146a5fa>] __netif_receive_skb+0x22a/0x590
[<c146ab0b>] netif_receive_skb+0x2b/0xf0
[<c13c43d2>] pcnet32_poll+0x292/0x780
[<c146b00a>] net_rx_action+0xfa/0x1e0
[<c103a4be>] __do_softirq+0xae/0x1e0
[...]
}
>From the log, we can see three different call chains between
CPU0 and CPU1:
Time 0 on CPU0:
kernel_init()->inet_init()->dev_add_pack()
At time 0, the ptype_lock is held by CPU0 in dev_add_pack();
Time 1 on CPU1:
tipc_enable_bearer()->enable_bearer()->dev_add_pack()
At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
wants to take ptype_lock to register TIPC protocol handler into the
networking stack. But the ptype_lock has been taken by dev_add_pack()
on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
busy looping.
Time 2 on CPU0:
netif_receive_skb()->recv_msg()->tipc_recv_msg()
At time 2, an incoming TIPC packet arrives at CPU0, hence
tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
to hold tipc_net_lock. At the moment, below scenario happens:
On CPU0, below is our sequence of taking locks:
lock(ptype_lock)->lock(tipc_net_lock)
On CPU1, our sequence of taking locks looks like:
lock(tipc_net_lock)->lock(ptype_lock)
Obviously deadlock may happen in this case.
But please note the deadlock possibly doesn't occur at all when the
first TIPC bearer is enabled. Before enable_bearer() -- running on
CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
recv_msg()) is not registered successfully via dev_add_pack(), so
the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
message comes to CPU0. But when the second TIPC bearer is
registered, the deadlock can perhaps really happen.
To fix it, we will push the work of registering TIPC protocol
handler into workqueue context. After the change, both paths taking
ptype_lock are always in process contexts, thus, the deadlock should
never occur.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethernet media initialization is only done when TIPC is started or
switched to network mode. So the initialization of the network device
notifier structure can be moved out of this function and done
statically instead.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal log buffer handling functions can now safely be
removed since there is no code using it anymore. Requests to
interact with the internal tipc log buffer over netlink (in
config.c) will report 'obsolete command'.
This represents the final removal of any references to a
struct print_buf, and the removal of the struct itself.
We also get rid of a TIPC specific Kconfig in the process.
Finally, log.h is removed since it is not needed anymore.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The tipc_printf is renamed to tipc_snprintf, as the new name
describes more what the function actually does. It is also
changed to take a buffer and length parameter and return
number of characters written to the buffer. All callers of
this function that used to pass a print_buf are updated.
Final removal of the struct print_buf itself will be done
synchronously with the pending removal of the deprecated
logging code that also was using it.
Functions that build up a response message with a list of
ports, nametable contents etc. are changed to return the number
of characters written to the output buffer. This information
was previously hidden in a field of the print_buf struct, and
the number of chars written was fetched with a call to
tipc_printbuf_validate. This function is removed since it
is no longer referenced nor needed.
A generic max size ULTRA_STRING_MAX_LEN is defined, named
in keeping with the existing TIPC_TLV_ULTRA_STRING, and the
various definitions in port, link and nametable code that
largely duplicated this information are removed. This means
that amount of link statistics that can be returned is now
increased from 2k to 32k.
The buffer overflow check is now done just before the reply
message is passed over netlink or TIPC to a remote node and
the message indicating a truncated buffer is changed to a less
dramatic one (less CAPS), placed at the end of the message.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
tipc_printf was previously used both to construct debug traces
and to append data to buffers that should be sent over netlink
to the tipc-config application. A global print_buffer was
used to format the string before it was copied to the actual
output buffer. This could lead to concurrent access of the
global print_buffer, which then had to be lock protected.
This is simplified by changing tipc_printf to append data
directly to the output buffer using vscnprintf.
With the new implementation of tipc_printf, there is no longer
any risk of concurrent access to the internal log buffer, so
the lock (and the comments describing it) are no longer
strictly necessary. However, there are still a few functions
that do grab this lock before resizing/dumping the log
buffer. We leave the lock, and these functions untouched since
they will be removed with a subsequent commit that drops the
deprecated log buffer handling code
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
To pave the way for a pending cleanup of tipc_printf, and
removal of struct print_buf entirely, we make that task simpler
by converting link_print to issue its messages with standard
printk infrastructure. [Original idea separated from a larger
patch from Erik Hugne <erik.hugne@ericsson.com>]
Cc: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The link queue traces and packet level debug functions served
a purpose during early development, but are now redundant
since there are other, more capable tools available for
debugging at the packet level.
The TIPC_DEBUG Kconfig option is removed since it does not
provide any extra debugging features anymore.
This gets rid of a lot of tipc_printf usages, which will
make the pending cleanup work of that function easier.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
All messages should go directly to the kernel log. The TIPC
specific error, warning, info and debug trace macro's are
removed and all references replaced with pr_err, pr_warn,
pr_info and pr_debug.
Commonly used sub-strings are explicitly declared as a const
char to reduce .text size.
Note that this means the debug messages (changed to pr_debug),
are now enabled through dynamic debugging, instead of a TIPC
specific Kconfig option (TIPC_DEBUG). The latter will be
phased out completely
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: use pr_fmt as suggested by Joe Perches <joe@perches.com>]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
With the default name table size of 1024, it is possible that
the sanity check in tipc_nametbl_stop could spam out 1024
essentially identical error messages if memory was corrupted
or similar. Limit it to issuing no more than a single message.
The actual chain number (i.e. 0 --> 1023) wouldn't provide any
useful insight if/when such an instance happened, so don't
bother printing out that value.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This is done to improve readability, and so that we can give
the struct a name that will allow us to declare a local
pointer to it in code, instead of having to always redirect
through the link struct to get to it.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.
For example, this cast:
int y;
int *p = (int *)&y;
I used the coccinelle script below to find and remove these
unnecessary casts. I manually removed the conversions this
script produces of casts with __force and __user.
@@
type T;
T *p;
@@
- (T *)p
+ p
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the comment blocks are floating in limbo between two
functions, or between blocks of code. Delete the extra line
feeds between any comment and its associated following block
of code, to be consistent with the majority of the rest of
the kernel. Also delete trailing newlines at EOF and fix
a couple trivial typos in existing comments.
This is a 100% cosmetic change with no runtime impact. We get
rid of over 500 lines of non-code, and being blank line deletes,
they won't even show up as noise in git blame.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds check to ensure TIPC sockets reject incoming payload messages
that have an unrecognized message type.
Remove the old open question about whether TIPC_ERR_NO_PORT is
the proper return value. It is appropriate here since there are
valid instances where another node can make use of the reply,
and at this point in time the host is already broadcasting TIPC
data, so there are no real security concerns.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Consolidates validation of scope and name sequence range values into
a single routine where it applies both to local name publications
and to name publications issued by other nodes in the network. This
change means that the scope value for non-local publications is now
validated and the name sequence range for local publications is now
validated only once. Additionally, a publication attempt that fails
validation now creates an entry in the system log file only if debugging
capabilities have been enabled; this prevents the system log from being
cluttered up with messages caused by a defective application or network
node.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Replaces two identical chunks of code that delete an unused name
sequence structure from TIPC's name table with calls to a new routine
that performs this operation.
This change is cosmetic and doesn't impact the operation of TIPC.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eliminate code to zero-out the main topology service structure,
which is already zeroed-out.
Get rid of a comment documenting a field of the main topology
service structure that no longer exists.
Both are cosmetic changes with no impact on runtime behaviour.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Initialization now occurs in the calling thread of control,
rather than being deferred to the TIPC tasklet. With the
current codebase, the deferral is no longer necessary.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Streamlines the job of re-initializing TIPC's network topology service
when a node's network address is first assigned. Rather than destroying
the topology server port and breaking its connections to existing
subscribers, TIPC now simply lets the service continue running (since
the change to the port identifier of each port used by the topology
service no longer impacts the flow of messages between the service and
its subscribers).
This enhancement means that applications that utilize the topology
service prior to the assignment of TIPC's network address no longer need
to re-establish their subscriptions when the address is finally assigned.
However, it is worth noting that any subsequent events for existing
subscriptions report the new port identifier of the publishing port,
rather than the original port identifier. (For example, a name that was
previously reported as being published by <0.0.0:ref> may be subsequently
withdrawn by <Z.C.N:ref>.)
This doesn't impact any of the existing known userspace in tipc-utils,
since (a) TIPC continues to treat references to the original port ID
correctly and (b) normal use cases assign an address before active use.
However if there does happen to be some rare/custom application out
there that was relying on this, they can simply bypass the enhancement
by issuing a subscription to {0,0} and break its connection to the
topology service, if an associated withdrawal event occurs.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Termination no longer tests to see if the configuration service
port was successfully created or not. In the unlikely event that the
port was not created, attempting to delete the non-existent port is
detected gracefully and causes no harm.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Initialization now occurs in the calling thread of control,
rather than being deferred to the TIPC tasklet. With the
current codebase, the deferral is no longer necessary.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Streamlines the job of re-initializing TIPC's configuration service
when a node's network address is first assigned. Rather than destroying
the configuration server port and then recreating it, TIPC now simply
withdraws the existing {0,<0.0.0>} name publication and creates a new
{0,<Z.C.N>} name publication that identifies the node's network address
to interested subscribers.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Untie gcc's hands and let it do what it wants within the
individual source files. There are two files, node.c and
port.c -- only the latter effectively changes (gcc-4.5.2).
Objdump shows gcc deciding to not inline port_peernode().
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the
memory limit. We need to make this limit a parameter for TCP use.
No functional change expected in this patch, all callers still using the
old sk_rcvbuf limit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhances command validation done by TIPC's configuration service so
that it works properly even if the node's network address is changed in
mid-operation. The default node address of <0.0.0> is now recognized as an
alias for "this node" even after a new network address has been assigned.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Revises handling of a rejected message to ensure that a locally
originated message is returned properly even if the node's network
address is changed in mid-operation. The routine now treats the
default node address of <0.0.0> as an alias for "this node" when
determining where to send a returned message.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Revises handling of send routines for payload messages to ensure that
they are processed properly even if the node's network address is
changed in mid-operation. The routines now treat the default node
address of <0.0.0> as an alias for "this node" when determining where
to send an outgoing message.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
There are two send routines that might conceivably be asked by an
application to send a message off-node when the node is still using
the default network address. These now have an added check that
detects this and rejects the message gracefully.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The routine that changes the node's network address now takes TIPC's
network lock in write mode while the main address variable and associated
data structures are being changed; this is needed to ensure that the
link subsystem won't attempt to send a message off-node until the sending
port's message header template has been updated with the node's new
network address.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Revises routines that deal with connections between two ports on
the same node to ensure the connection is not impacted if the node's
network address is changed in mid-operation. The routines now treat
the default node address of <0.0.0> as an alias for "this node" in
the following situations:
1) Incoming messages destined to a connected port now handle the alias
properly when validating that the message was sent by the expected
peer port, ensuring that the message will be accepted regardless of
whether it specifies the node's old network address or it's current one.
2) The code which completes connection establishment now handles the
alias properly when determining if the peer port is on the same node
as the connected port.
An added benefit of addressing issue 1) is that some peer port
validation code has been relocated to TIPC's socket subsystem, which
means that validation is no longer done twice when a message is
sent to a non-socket port (such as TIPC's configuration service or
network topology service).
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Prior to commit 23dd4cce38
"tipc: Combine port structure with tipc_port structure"
there was a need for the two sets of helper functions. But
now they are just duplicates. Remove the globally visible
ones, and mark the remaining ones as inline.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Re-orders port creation logic so that the initialization of a new
port's message header template occurs while the port list lock is
held. This ensures that a change to the node's network address that
occurs at the same time as the port is being created does not result
in the template identifying the sender using the former network
address. The new approach guarantees that the new port's template is
using the current network address or that it will be updated when
the address changes.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Removes an unnecessary check in the logic that updates the message
header template for existing ports when a node's network address is
first assigned. There is no longer any need to check to see if the
node's network address has actually changed since the calling routine
has already verified that this is so.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Revises routines that add and remove an entry from a node's name table
so that the publication scope lists are updated properly even if the
node's network address is changed in mid-operation. The routines now
recognize the default node address of <0.0.0> as an alias for "this node"
even after a new network address has been assigned.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Introduces routines that test whether a given network address is
equal to a node's own network address or if it lies within the node's
own network cluster, and which work properly regardless of whether
the node is using the default network address <0.0.0> or a non-zero
network address that is assigned later on. In essence, these routines
ensure that address <0.0.0> is treated as an alias for "this node",
regardless of which network address the node is actually using.
Old users of the pre-existing more strict match in_own_cluster()
have been accordingly redirected to what is now called
in_own_cluster_exact() --- which does not extend matching to <0,0,0>.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
No longer increments counter of number of publications by a node
if an attempt to add a new publication fails. This prevents TIPC from
incorrectly blocking future publications because the configured maximum
number of publications has been reached.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ensures that node-scope name publications that exist prior to the
configuration of a node's network address are properly re-initialized
with that address when it is assigned. TIPC's node-scope publications
are now tracked using a publications list like the lists used for
cluster-scope and zone-scope publications so they can be easily updated
when required.
The inclusion of node scope name publications in a conventional publication
list means that they must now also be withdrawn, just like cluster and zone
scope publications are currently withdrawn. So some conditional tests on
scope ==/!= TIPC_NODE_SCOPE are inserted/removed accordingly.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Utilizes distinct lists to track zone-scope and cluster-scope names
published by a node. For now, TIPC continues to process the entries
in both lists in the same way; however, an upcoming patch will utilize
the existence of the lists to prevent the sending of cluster-scope names
to nodes that are not part of the local cluster.
To achieve this, an array of publication lists is introduced, so
that they can be iterated over and accessed via publ->scope as
an index where convenient.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This is done so that it can be reused with differing publication
lists, instead of being hard coded to the cluster publicaton list.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
There is currently a single list that is containing both cluster-scope and
zone-scope publications, and the list count is a separate free floating
variable. Create a struct to bind the count to the list, and to pave
the way for factoring out the publications into zone/cluster/node scope.
The current "publ_root" most matches what will be the cluster scope
list, so it is named accordingly in this commit.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimizes routines that send payload messages so that they no longer
update the "originating node" and "originating port" fields of the
outgoing message header template, since these fields are initialized
when the sending port is created and never change thereafter. Also
optimizes the routine which updates the message header template when
a connection to a port is established, for the same reason.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Removes code that updated the "previous node" field of an out-going
message over TIPC's links. Such updating is unnecessary since the
removal of the prototype multi-cluster capability means that all
outgoing messages are generated locally and already have this field
populated correctly.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Converts a non-trivial routine from inline to non-inline form
to avoid bloating the TIPC code base with 6 copies of its body.
This change is essentially cosmetic, and doesn't change existing
TIPC behavior.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Removes all references to the global variable that records whether
TIPC is running in "single node" mode or "network" mode, since this
information can be easily deduced from the global variable that
records TIPC's network address. (i.e. a non-zero network address
means that TIPC is running in network mode.)
The changes made update most existing mode-based checks to use the
network address global variable. A few checks that are no longer
needed are removed entirely, along with any associated code lying on
non-executable control paths.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Removes all references to TIPC's "not running" mode, since the
removal of support for the native API means that there is no longer
any way to interact with TIPC if it has not been initialized.
The changes made consist of removing mode-based checks that are no
longer needed, along with any associated code lying on non-executable
control paths.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Restores name table translation using a non-zero domain that is
"out of scope", which was broken by an earlier commit
(5d9c54c1e9). Comments have now been
added to the name table translation routine to make it clear that
there are actually three possible outcomes to a translation request
(found/not found/deferred), rather than just two (found/not found).
Note that a straightforward revert of the earlier commit is not
possible, as other changes to the name table translation logic
have occurred since the incorrect optimization was made.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Optimizes processing done when contact with a neighboring node is
established to avoid recording the current state of outgoing broadcast
messages if the neighboring node isn't a valid broadcast link destination,
since this state information isn't needed for such nodes.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eliminates a block of comments that describe how routing table updates
are to be handled. These comments no longer apply following the removal
of TIPC's prototype multi-cluster support.
Note that these changes are essentially cosmetic in nature, and have
no impact on the actual operation of TIPC.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Gets rid of two inlined routines that simply call existing sk_buff
manipulation routines, since there is no longer any extra processing
done by the helper routines.
Note that these changes are essentially cosmetic in nature, and have
no impact on the actual operation of TIPC.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Relocates information about the size of TIPC's node table index and
its associated hash function, since only node subsystem routines need
to have access to this information.
Note that these changes are essentially cosmetic in nature, and have
no impact on the actual operation of TIPC.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Simplifies a comparison operation to eliminate a useless test that
checks if an unsigned value is less than zero.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This "shortform" is actually longer than typing out what it is really
trying to do, and just makes reading the code more difficult, so
lets simply shoot it in the head.
In the case of log.c - the comparison is on a u32, so we can drop the
check for < 0 at the same time.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds a new check to TIPC's name table logic to reject any attempt to
create a new name publication that is identical to an existing one.
(Such an attempt will never happen under normal circumstances, but
could arise if another network node malfunctions and issues a duplicate
name publication message.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Streamlines the logic that prevents an application from binding a
reserved TIPC name type to a port by moving the check to the code
that handles a socket bind() operation. This allows internal TIPC
subsystems to bind a reserved name without having to set an atomic
flag to gain permission to use such a name. (This simplification is
now possible due to the elimination of support for TIPC's native API.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Eliminates a check in the processing of TIPC messages arriving from
off node that ensures the message is destined for this node, since this
check duplicates an earlier check. (The check would be necessary if TIPC
needed to be able to route incoming messages to another node, but the
elimination of multi-cluster support means that this never happens and
all incoming messages are consumed by the receiving node.)
Note: This change involves the elimination of a single "if" statement
with a large "then" clause; consequently, a significant number of lines
end up getting re-indented. In addition, a simple message header access
routine that is no longer referenced is eliminated. However, the only
functional change is the elimination of the single check described above.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Utilizes the new "node signature" field in neighbor discovery messages
to ensure that all links TIPC associates with a given <Z.C.N> network
address belong to the same neighboring node. (Previously, TIPC could not
tell if link setup requests arriving on different interfaces were from
the same node or from two different nodes that has mistakenly been assigned
the same network address.)
The revised algorithm for detecting a duplicate node considers both the
node signature and the network interface adddress specified in a request
message when deciding how to respond to a link setup request. This prevents
false alarms that might otherwise arise during normal network operation
under the following scenarios:
a) A neighboring node reboots. (The node's signature changes, but the
network interface address remains unchanged.)
b) A neighboring node's network interface is replaced. (The node's signature
remains unchanged, but the network interface address changes.)
c) A neighboring node is completely replaced. (The node's signature and
network interface address both change.)
The algorithm also handles cases in which a node reboots and re-establishes
its links to TIPC (or begins re-establishing those links) before TIPC
detects that it is using a new node signature. In such cases of "delayed
rediscovery" TIPC simply accepts the new signature without disrupting
communication that is already underway over the links.
Thanks to Laser [gotolaser@gmail.com] for his contributions to the
development of this enhancement.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds support for the new "node signature" in neighbor discovery messages,
which is a 16 bit identifier chosen randomly when TIPC is initialized.
This field makes it possible for nodes receiving a neighbor discovery
message to detect if multiple neighboring nodes are using the same network
address (i.e. <Z.C.N>), even when the messages are arriving on different
interfaces.
This first phase of node signature support creates the signature,
incorporates it into outgoing neighbor discovery messages, and tracks
the signature used by valid neighbors. An upcoming patch builds on this
foundation to implement the improved duplicate neighbor detection checking.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies message rejection logic so that TIPC doesn't attempt to
send a FIN message to the rejecting port if it is known in advance
that there is no such message because the rejecting port doesn't exist.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Removes code that alters the publication key of a name table entry
that is being forcibly purged from TIPC's name table after contact
with the publishing node has been lost.
Current TIPC ensures that all defunct names are purged before
re-establishing contact with a failed node. There used to be a risk
that the publication might be accidentally deleted because it might be
re-added to the name table before the purge operation was completed.
But now there is no longer a need to ensure that the new key is different
than the old one.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies broadcast link so that an incoming fragmented message is not
lost if reassembly cannot begin because there currently is no buffer
big enough to hold the entire reassembled message. The broadcast link
now ignores the first fragment completely, which causes the sending node
to retransmit the first fragment so that reassembly can be re-attempted.
Previously, the sender would have had no reason to retransmit the 1st
fragment, so we would never have a chance to re-try the allocation.
To do this cleanly without duplicaton, a new bclink_accept_pkt()
function is introduced.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies unicast link endpoint logic so an incoming fragmented message
is not lost if reassembly cannot begin because there is no buffer big
enough to hold the entire reassembled message. The link endpoint now
ignores the first fragment completely, which causes the sending node to
retransmit the first fragment so that reassembly can be re-attempted.
Previously, the sender would have had no reason to retransmit the 1st
fragment, so we would never have a chance to re-try the allocation.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Eliminates support for the broadcast tag field, which is no longer
used by broadcast link NACK messages.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Completely redesigns broadcast link ACK and NACK mechanisms to prevent
spurious retransmit requests in dual LAN networks, and to prevent the
broadcast link from stalling due to the failure of a receiving node to
acknowledge receiving a broadcast message or request its retransmission.
Note: These changes only impact the timing of when ACK and NACK messages
are sent, and not the basic broadcast link protocol itself, so inter-
operability with nodes using the "classic" algorithms is maintained.
The revised algorithms are as follows:
1) An explicit ACK message is still sent after receiving 16 in-sequence
messages, and implicit ACK information continues to be carried in other
unicast link message headers (including link state messages). However,
the timing of explicit ACKs is now based on the receiving node's absolute
network address rather than its relative network address to ensure that
the failure of another node does not delay the ACK beyond its 16 message
target.
2) A NACK message is now typically sent only when a message gap persists
for two consecutive incoming link state messages; this ensures that a
suspected gap is not confirmed until both LANs in a dual LAN network have
had an opportunity to deliver the message, thereby preventing spurious NACKs.
A NACK message can also be generated by the arrival of a single link state
message, if the deferred queue is so big that the current message gap
cannot be the result of "normal" mis-ordering due to the use of dual LANs
(or one LAN using a bonded interface). Since link state messages typically
arrive at different nodes at different times the problem of multiple nodes
issuing identical NACKs simultaneously is inherently avoided.
3) Nodes continue to "peek" at NACK messages sent by other nodes. If
another node requests retransmission of a message gap suspected (but not
yet confirmed) by the peeking node, the peeking node forgets about the
gap and does not generate a duplicate retransmit request. (If the peeking
node subsequently fails to receive the lost message, later link state
messages will cause it to rediscover and confirm the gap and send another
NACK.)
4) Message gap "equality" is now determined by the start of the gap only.
This is sufficient to deal with the most common cases of message loss,
and eliminates the need for complex end of gap computations.
5) A peeking node no longer tries to determine whether it should send a
complementary NACK, since the most common cases of message loss don't
require it to be sent. Consequently, the node no longer examines the
"broadcast tag" field of a NACK message when peeking.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ensures that all attempts to update broadcast link statistics are done
only while holding the lock that protects the link's main data structures,
to prevent interference by simultaneous updates caused by messages
arriving on other interfaces.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies broadcast link so that it increments the "received duplicate
message" count if an incoming message cannot be added to the deferred
message queue because it is already present in the queue. (The aligns
broadcast link behavior with that of TIPC's unicast links.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Fixes a pair of problems in broadcast link message reception code
relating to the reclamation of the node lock after consuming an
in-sequence message.
1) Now retests to see if the sending node is still up after reclaiming
the node lock, and bails out if it is non-operational.
2) Now manipulates the node's deferred message queue only after
reclaiming the node lock, rather than using queue head pointer
information that was cached previously.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ensures that any attempt to send a NACK message over TIPC's broadcast
link has exclusive access to the link's main data structures, to prevent
interference with a simultaneous attempt to send other broadcast link
traffic (such as application-generated multicast messages).
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Corrects a problem in which a link endpoint that activates as the
result of receiving a RESET/STATE sequence of link protocol messages
fails to properly record the broadcast link status information about
the node to which it is now communicating with. (The problem does
not occur with the more common RESET/ACTIVATE sequence of messages.)
The fix ensures that the broadcast link status info is updated after
the RESET message resets the link endpoint, rather than before, thereby
preventing new information from being overwritten by the reset operation.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Fix a bug that can prevent TIPC from sending broadcast messages to a node
if contact with the node is lost and then regained. The problem occurs if
the broadcast link first clears the flag indicating the node is part of the
link's distribution set (when it loses contact with the node), and later
fails to restore the flag (when contact is regained); restoration fails
if contact with the node is regained by implicit unicast link activation
triggered by the arrival of a data message, rather than explicitly by the
arrival of a link activation message.
The broadcast link now uses separate fields to track whether a node is
theoretically capable of receiving broadcast messages versus whether it is
actually part of the link's distribution set. The former member is updated
by the receipt of link protocol messages, which can occur at any time; the
latter member is updated only when contact with the node is gained or lost.
This change also permits the simplification of several conditional
expressions since the broadcast link's "supported" field can now only be
set if there are working links to the associated node.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ensure that sequence number information about incoming broadcast link
messages is initialized only by the activation of the first link to a
given cluster node. Previously, a race condition allowed reset and/or
activation messages for a second link to re-initialize this sequence
number information with obsolete values. This could trigger TIPC to
request the retransmission of previously acknowledged broadcast link
messages from that node, resulting in broadcast link processing becoming
stalled if the node had already released one or more of those messages
and was unable to perform the required retransmission.
Thanks to Laser <gotolaser@gmail.com> for identifying this problem
and assisting in the development of this fix.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Ensures that a link endpoint discards any previously deferred link
protocol message whenever it attempts to send a new one.
Previously, it was possible for a link protocol message that was unsent
due to congestion to be transmitted after newer protocol messages had
been sent. The stale link protocol message might then cause the receiving
link endpoint to malfunction because of its outdated conent.
Thanks to Osamu Kaminuma [okaminum@avaya.com] for diagnosing the problem
and contributing a prototype patch.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Re-code the algorithm for inserting an out-of-sequence message into
a unicast or broadcast link's deferred message queue. It remains
functionally equivalent but should be easier to understand/maintain.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The addition of the "s" to indicate pluralization is intentional,
since the struct actually contains two name variants.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This changes both the struct bcbearer and struct bcbearer_pair to
have the "tipc_" prefix. Runtime behaviour is unchanged.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Give it a meaningful prefix, as suggested by DaveM, so that it
is consistent with things like struct tipc_bearer, and so it isn't
confused with anything else. This has no impact on the actual
runtime code behaviour.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Migrates the buf_seqno() helper routine from broadcast link level to
unicast link level so that it can be used both types of TIPC links.
This is a cosmetic change only, and does not affect the operation of TIPC.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds checks to TIPC's broadcast link so that it ignores any
acknowledgement message containing a sequence number that does not
correspond to an unacknowledged message currently in the broadcast
link's transmit queue.
This change prevents the broadcast link from becoming stalled if a
newly booted node receives stale broadcast link acknowledgement
information from another node that has not yet fully synchronized
its end of the broadcast link to reflect the current state of the
new node's end.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds code to release any unsent broadcast messages in the broadcast link
transmit queue if TIPC loses contact with its only neighboring node.
Previously, a broadcast link that was in the congested state would hold
on to the unsent messages, even though the messages were now undeliverable.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The two broadcast link statistics fields that are used to derive the
average length of that link's transmit queue are now updated only after
a successful attempt to send a broadcast message, since there is no need
to update these values when an unsuccessful send attempt leaves the
queue unchanged.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds a check to detect when an attempt is made to send a message
via the broadcast link and no neighboring nodes are currently available
to receive it. Rather than wasting effort passing the message to the
broadcast link and broadcast bearer, who will only throw it away,
TIPC now frees the message immediately and reports success (i.e. the
message has been delivered to all available destinations).
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Fixes oversight that allowed broadcast link node map to be updated without
first taking the broadcast link spinlock that protects the map. As part
of this fix the node map has been incorporated into the broadcast link
structure to make the need for such protection more evident.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Creates global variables to hold the broadcast link's pseudo-bearer and
pseudo-link structures, rather than allocating them dynamically. There
is only a single instance of each structure, and changing over to static
allocation allows elimination of code to handle the cases where dynamic
allocation was unsuccessful.
The memset in the teardown code may look like they aren't used, but
the same teardown code is run when there is a non-fatal error at
init-time, so that stale data isn't present when the user fixes the
cause of the soft error.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Gets rid of an unnecessary check in the routine that updates the port id
of a node's name publications when the node is assigned a network address,
since the routine is only invoked if the new address is different from
the existing one.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies TIPC's module unloading logic to switch itself into "single
node" mode before starting to terminate networking support. This helps
to ensure that no operations that require TIPC to be in "networking"
mode can initiate once unloading starts.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Gets rid of two pointless operations that zero out the array used to
record information about TIPC's Ethernet bearers. There is no need to
initialize the array on start up since it is a global variable that is
already zero'd out, and there is no need to zero it out on exit because
the array is never referenced again.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies Ethernet bearer disable logic to break the association between
the bearer and its device driver at the time the bearer is disabled,
rather than when the TIPC module is unloaded. This allows the array
entry used by the disabled bearer to be re-used if the same bearer (or
a different one) is subsequently enabled.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Change TIPC's shutdown code to deactivate generic networking support
before terminating Ethernet media support. The deactivation of generic
networking support causes all existing bearers to be destroyed, meaning
the Ethernet media termination routine no longer has to bother marking
them as unavailable.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Comment-only change to better explain why TIPC's configuration lock is
temporarily released while activating support for network interfaces,
and why the existing activation code doesn't require rework.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Permits run-time alteration of default link settings on a per-media
and per-bearer basis, in addition to the existing per-link basis.
The following syntax can now be used:
tipc-config -lt=<link-name|bearer-name|media-name>/<tolerance>
tipc-config -lp=<link-name|bearer-name|media-name>/<priority>
tipc-config -lw=<link-name|bearer-name|media-name>/<window>
Note that changes to the default settings for a given media type has
no effect on the default settings used by existing bearers. Similarly,
changes to default bearer settings has no effect on existing link
endpoints that utilize that interface.
Thanks to Florian Westphal <fw@strlen.de> for his contributions to
the development of this enhancement.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds a check to ensure that TIPC ignores an incoming neighbor discovery
message that specifies an invalid media address as its source. The check
ensures that the source address is a valid, non-broadcast address that
could legally be used by a neighboring link endpoint.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Reworks TIPC's media address data structure and associated processing
routines to transfer all media-specific details of address conversion
to the associated TIPC media adaptation code. TIPC's generic bearer code
now only needs to know which media type an address is associated with
and whether or not it is a broadcast address, and totally ignores the
"value" field that contains the actual media-specific addressing info.
These changes eliminate the need for a number of endianness conversion
operations and will make it easier for TIPC to support new media types
in the future.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Enhances TIPC's Ethernet media support to provide 3 new address conversion
routines, which allow TIPC to interpret an address that is in string form
and to convert an address to and from the 20 byte format used in TIPC's
neighbor discovery messages.
These routines are pre-requisites to a follow on commit that hides all
media-specific addressing details from TIPC's generic bearer code.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Enhances conversion of a media address to printable form so that an
unconvertable address will be displayed as a string of hex digits,
rather than not being displayed at all. (Also removes a pointless check
for the existence of the media-specific address conversion routine,
since the routine is not optional.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Simplifies error handling performed during media registration, since
TIPC no longer supports the dynamic addition of new media types that
are potentially error-prone. These simplifications include the following:
1) No longer check for premature registration of a new media type.
2) No longer check for negative link priority values (which was pointless
since such values are unsigned, and could cause a compiler warning).
3) No longer generate a warning describing the exact cause of any
registration failure (just warns that overall registration failed).
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Changes TIPC's list of registered media types from an array of media
structures to an array of pointers to media structures. This eliminates
the need to copy of the contents of the structure passed in during media
registration.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Streamlines the detection of an attempt to register a TIPC media structure
using an already registered name or type identifier. The revised logic now
reuses an existing routine to detect an existing name and no longer
unnecessarily manipulates the media type counter during an unsuccessful
registration attempt.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Speeds up the registration of TIPC media types by passing in a structure
containing the required information, rather than by passing in the various
fields describing the media type individually.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>