IPv4 side of the problem was addressed in commit a9e050f4e7
(net: tcp: GRO should be ECN friendly)
This patch does the same, but for IPv6 : A Traffic Class mismatch
doesnt mean flows are different, but instead should force a flush
of previous packets.
This patch removes artificial packet reordering problem.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The retry loop in neigh_resolve_output() and neigh_connected_output()
call dev_hard_header() with out reseting the skb to network_header.
This causes the retry to fail with skb_under_panic. The fix is to
reset the network_header within the retry loop.
Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com>
Reviewed-by: Shawn Lu <shawn.lu@ericsson.com>
Reviewed-by: Robert Coulson <robert.coulson@ericsson.com>
Reviewed-by: Billie Alsup <billie.alsup@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Over time, skb recycling infrastructure got litle interest and
many bugs. Generic rx path skb allocation is now using page
fragments for efficient GRO / TCP coalescing, and recyling
a tx skb for rx path is not worth the pain.
Last identified bug is that fat skbs can be recycled
and it can endup using high order pages after few iterations.
With help from Maxime Bizon, who pointed out that commit
87151b8689 (net: allow pskb_expand_head() to get maximum tailroom)
introduced this regression for recycled skbs.
Instead of fixing this bug, lets remove skb recycling.
Drivers wanting really hot skbs should use build_skb() anyway,
to allocate/populate sk_buff right before netif_receive_skb()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
I get a panic when I use ss -a and rmmod inet_diag at the
same time.
It's because netlink_dump uses inet_diag_dump which belongs to module
inet_diag.
I search the codes and find many modules have the same problem. We
need to add a reference to the module which the cb->dump belongs to.
Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.
Change From v3:
change netlink_dump_start to inline,suggestion from Pablo and
Eric.
Change From v2:
delete netlink_dump_done,and call module_put in netlink_dump
and netlink_sock_destruct.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking changes from David Miller:
"The most important bit in here is the fix for input route caching from
Eric Dumazet, it's a shame we couldn't fully analyze this in time for
3.6 as it's a 3.6 regression introduced by the routing cache removal.
Anyways, will send quickly to -stable after you pull this in.
Other changes of note:
1) Fix lockdep splats in team and bonding, from Eric Dumazet.
2) IPV6 adds link local route even when there is no link local
address, from Nicolas Dichtel.
3) Fix ixgbe PTP implementation, from Jacob Keller.
4) Fix excessive stack usage in cxgb4 driver, from Vipul Pandya.
5) MAC length computed improperly in VLAN demux, from Antonio
Quartulli."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
ipv6: release reference of ip6_null_entry's dst entry in __ip6_del_rt
Remove noisy printks from llcp_sock_connect
tipc: prevent dropped connections due to rcvbuf overflow
silence some noisy printks in irda
team: set qdisc_tx_busylock to avoid LOCKDEP splat
bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
sctp: check src addr when processing SACK to update transport state
sctp: fix a typo in prototype of __sctp_rcv_lookup()
ipv4: add a fib_type to fib_info
can: mpc5xxx_can: fix section type conflict
can: peak_pcmcia: fix error return code
can: peak_pci: fix error return code
cxgb4: Fix build error due to missing linux/vmalloc.h include.
bnx2x: fix ring size for 10G functions
cxgb4: Dynamically allocate memory in t4_memory_rw() and get_vpd_params()
ixgbe: add support for X540-AT1
ixgbe: fix poll loop for FDIRCTRL.INIT_DONE bit
ixgbe: fix PTP ethtool timestamping function
ixgbe: (PTP) Fix PPS interrupt code
ixgbe: Fix PTP X540 SDP alignment code for PPS signal
...
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9c650ffc ("TTY: ircomm_tty, add tty install") split _open() to
_install() and _open(). It also moved the initialization of driver_data
out of open(), but never added it to install() - causing a NULL ptr
deref whenever the driver was used.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
as we hold dst_entry before we call __ip6_del_rt,
so we should alse call dst_release not only return
-ENOENT when the rt6_info is ip6_null_entry.
and we already hold the dst entry, so I think it's
safe to call dst_release out of the write-read lock.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Validation of userspace input shouldn't trigger dmesg spamming.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When large buffers are sent over connected TIPC sockets, it
is likely that the sk_backlog will be filled up on the
receiver side, but the TIPC flow control mechanism is happily
unaware of this since that is based on message count.
The sender will receive a TIPC_ERR_OVERLOAD message when this occurs
and drop it's side of the connection, leaving it stale on
the receiver end.
By increasing the sk_rcvbuf to a 'worst case' value, we avoid the
overload caused by a full backlog queue and the flow control
will work properly.
This worst case value is the max TIPC message size times
the flow control window, multiplied by two because a sender
will transmit up to double the window size before a port is marked
congested.
We multiply this by 2 to account for the sk_buff and other overheads.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fuzzing causes these printks to spew constantly.
Changing them to DEBUG statements is consistent with other usage in the file,
and makes them disappear when CONFIG_IRDA_DEBUG is disabled.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppose we have an SCTP connection with two paths. After connection is
established, path1 is not available, thus this path is marked as inactive. Then
traffic goes through path2, but for some reasons packets are delayed (after
rto.max). Because packets are delayed, the retransmit mechanism will switch
again to path1. At this time, we receive a delayed SACK from path2. When we
update the state of the path in sctp_check_transmitted(), we do not take into
account the source address of the SACK, hence we update the wrong path.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just to avoid confusion when people only reads this prototype.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit d2d68ba9fe (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.
This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.
David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.
With help from Julian Anastasov who provided very helpful
hints, reproduced here :
<quote>
Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:
broadcast 192.168.0.255 dev DEV proto kernel scope link src
192.168.0.1
192.168.0.0/24 dev DEV proto kernel scope link src 192.168.0.1
local 192.168.0.1 dev DEV proto kernel scope host src 192.168.0.1
The "dev DEV proto kernel scope link src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.
And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.
</quote>
Many thanks to Chris Clayton for his patience and help.
Reported-by: Chris Clayton <chris2553@googlemail.com>
Bisected-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
skb_reset_mac_len() relies on the value of the skb->network_header pointer,
therefore we must wait for such pointer to be recalculated before computing
the new mac_len value.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), a route
to fe80::/64 is added in the main table:
unreachable fe80::/64 dev lo proto kernel metric 256 error -101
This route does not match any prefix (no fe80:: address on lo). In fact,
addrconf_dev_config() will not add link local address because this function
filters interfaces by type. If the link local address is added manually, the
route to the link local prefix will be automatically added by
addrconf_add_linklocal().
Note also, that this route is not deleted when the address is removed.
After looking at the code, it seems that addrconf_add_lroute() is redundant with
addrconf_add_linklocal(), because this function will add the link local route
when the link local address is configured.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network merge brought in a few users of functions that got
deprecated by the workqueue cleanups: the 'system_nrt_wq' is now the
same as the regular system_wq, since all workqueues are now non-
reentrant.
Similarly, remove one use of flush_work_sync() - the regular
flush_work() has become synchronous, and the "_sync()" version is thus
deprecated as being superfluous.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking changes from David Miller:
1) GRE now works over ipv6, from Dmitry Kozlov.
2) Make SCTP more network namespace aware, from Eric Biederman.
3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.
4) Make openvswitch network namespace aware, from Pravin B Shelar.
5) IPV6 NAT implementation, from Patrick McHardy.
6) Server side support for TCP Fast Open, from Jerry Chu and others.
7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
Borkmann.
8) Increate the loopback default MTU to 64K, from Eric Dumazet.
9) Use a per-task rather than per-socket page fragment allocator for
outgoing networking traffic. This benefits processes that have very
many mostly idle sockets, which is quite common.
From Eric Dumazet.
10) Use up to 32K for page fragment allocations, with fallbacks to
smaller sizes when higher order page allocations fail. Benefits are
a) less segments for driver to process b) less calls to page
allocator c) less waste of space.
From Eric Dumazet.
11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.
12) VXLAN device driver, one way to handle VLAN issues such as the
limitation of 4096 VLAN IDs yet still have some level of isolation.
From Stephen Hemminger.
13) As usual there is a large boatload of driver changes, with the scale
perhaps tilted towards the wireless side this time around.
Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
hyperv: Add buffer for extended info after the RNDIS response message.
hyperv: Report actual status in receive completion packet
hyperv: Remove extra allocated space for recv_pkt_list elements
hyperv: Fix page buffer handling in rndis_filter_send_request()
hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
hyperv: Fix the max_xfer_size in RNDIS initialization
vxlan: put UDP socket in correct namespace
vxlan: Depend on CONFIG_INET
sfc: Fix the reported priorities of different filter types
sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
sfc: Fix loopback self-test with separate_tx_channels=1
sfc: Fix MCDI structure field lookup
sfc: Add parentheses around use of bitfield macro arguments
sfc: Fix null function pointer in efx_sriov_channel_type
vxlan: virtual extensible lan
igmp: export symbol ip_mc_leave_group
netlink: add attributes to fdb interface
tg3: unconditionally select HWMON support when tg3 is enabled.
Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
gre: fix sparse warning
...
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
Pull cgroup hierarchy update from Tejun Heo:
"Currently, different cgroup subsystems handle nested cgroups
completely differently. There's no consistency among subsystems and
the behaviors often are outright broken.
People at least seem to agree that the broken hierarhcy behaviors need
to be weeded out if any progress is gonna be made on this front and
that the fallouts from deprecating the broken behaviors should be
acceptable especially given that the current behaviors don't make much
sense when nested.
This patch makes cgroup emit warning messages if cgroups for
subsystems with broken hierarchy behavior are nested to prepare for
fixing them in the future. This was put in a separate branch because
more related changes were expected (didn't make it this round) and the
memory cgroup wanted to pull in this and make changes on top."
* 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
Pull cgroup updates from Tejun Heo:
- xattr support added. The implementation is shared with tmpfs. The
usage is restricted and intended to be used to manage per-cgroup
metadata by system software. tmpfs changes are routed through this
branch with Hugh's permission.
- cgroup subsystem ID handling simplified.
* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Define CGROUP_SUBSYS_COUNT according the configuration
cgroup: Assign subsystem IDs during compile time
cgroup: Do not depend on a given order when populating the subsys array
cgroup: Wrap subsystem selection macro
cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
cgroup: net_prio: Do not define task_netpioidx() when not selected
cgroup: net_cls: Do not define task_cls_classid() when not selected
cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h
cgroup: trivial fixes for Documentation/cgroups/cgroups.txt
xattr: mark variable as uninitialized to make both gcc and smatch happy
fs: add missing documentation to simple_xattr functions
cgroup: add documentation on extended attributes usage
cgroup: rename subsys_bits to subsys_mask
cgroup: add xattr support
cgroup: revise how we re-populate root directory
xattr: extract simple_xattr code from tmpfs
Pull workqueue changes from Tejun Heo:
"This is workqueue updates for v3.7-rc1. A lot of activities this
round including considerable API and behavior cleanups.
* delayed_work combines a timer and a work item. The handling of the
timer part has always been a bit clunky leading to confusing
cancelation API with weird corner-case behaviors. delayed_work is
updated to use new IRQ safe timer and cancelation now works as
expected.
* Another deficiency of delayed_work was lack of the counterpart of
mod_timer() which led to cancel+queue combinations or open-coded
timer+work usages. mod_delayed_work[_on]() are added.
These two delayed_work changes make delayed_work provide interface
and behave like timer which is executed with process context.
* A work item could be executed concurrently on multiple CPUs, which
is rather unintuitive and made flush_work() behavior confusing and
half-broken under certain circumstances. This problem doesn't
exist for non-reentrant workqueues. While non-reentrancy check
isn't free, the overhead is incurred only when a work item bounces
across different CPUs and even in simulated pathological scenario
the overhead isn't too high.
All workqueues are made non-reentrant. This removes the
distinction between flush_[delayed_]work() and
flush_[delayed_]_work_sync(). The former is now as strong as the
latter and the specified work item is guaranteed to have finished
execution of any previous queueing on return.
* In addition to the various bug fixes, Lai redid and simplified CPU
hotplug handling significantly.
* Joonsoo introduced system_highpri_wq and used it during CPU
hotplug.
There are two merge commits - one to pull in IRQ safe timer from
tip/timers/core and the other to pull in CPU hotplug fixes from
wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."
Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.
Tejun pointed out a few of them, I fixed a couple more.
* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
workqueue: remove @delayed from cwq_dec_nr_in_flight()
workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
workqueue: use __cpuinit instead of __devinit for cpu callbacks
workqueue: rename manager_mutex to assoc_mutex
workqueue: WORKER_REBIND is no longer necessary for idle rebinding
workqueue: WORKER_REBIND is no longer necessary for busy rebinding
workqueue: reimplement idle worker rebinding
workqueue: deprecate __cancel_delayed_work()
workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
workqueue: use mod_delayed_work() instead of __cancel + queue
workqueue: use irqsafe timer for delayed_work
workqueue: clean up delayed_work initializers and add missing one
workqueue: make deferrable delayed_work initializer names consistent
workqueue: cosmetic whitespace updates for macro definitions
workqueue: deprecate system_nrt[_freezable]_wq
workqueue: deprecate flush[_delayed]_work_sync()
...
Later changes need to be able to refer to neighbour attributes
when doing fdb_add.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An ULP is supposed to be able to replace a GSS rpc_auth object with
another GSS rpc_auth object using rpcauth_create(). However,
rpcauth_create() in 3.5 reliably fails with -EEXIST in this case.
This is because when gss_create() attempts to create the upcall pipes,
sometimes they are already there. For example if a pipe FS mount
event occurs, or a previous GSS flavor was in use for this rpc_clnt.
It turns out that's not the only problem here. While working on a
fix for the above problem, we noticed that replacing an rpc_clnt's
rpc_auth is not safe, since dereferencing the cl_auth field is not
protected in any way.
So we're deprecating the ability of rpcauth_create() to switch an
rpc_clnt's security flavor during normal operation. Instead, let's
add a fresh API that clones an rpc_clnt and gives the clone a new
flavor before it's used.
This makes immediate use of the new __rpc_clone_client() helper.
This can be used in a similar fashion to rpcauth_create() when a
client is hunting for the correct security flavor. Instead of
replacing an rpc_clnt's security flavor in a loop, the ULP replaces
the whole rpc_clnt.
To fix the -EEXIST problem, any ULP logic that relies on replacing
an rpc_clnt's rpc_auth with rpcauth_create() must be changed to use
this API instead.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
rpc_clone_client() does most of the same tasks as rpc_new_client(),
so there is an opportunity for code re-use. Create a generic helper
that makes it easy to clone an RPC client while replacing any of the
clnt's parameters.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Some function names have changed, but debugging messages
were never updated. Automate the construction of the function name
in debugging messages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: The blank space in front of the message must be spaces.
Tabs show up on the console as a graphical character.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If we are creating an osd request and get an invalid layout, return
an EINVAL to the caller. We switch up the return to have an error
code instead of NULL implying -ENOMEM.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
If we encounter an invalid (e.g., zeroed) mapping, return an error
and avoid a divide by zero.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Use be16 consistently when looking at flags.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because sizeof() is size_t then if "len" is negative, it counts as a
large positive value.
The call tree looks like:
pfkey_sendmsg()
-> pfkey_process()
-> pfkey_spdadd()
-> parse_ipsecrequests()
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add GRO capability to IPv4 GRE tunnels, using the gro_cells
infrastructure.
Tested using IPv4 and IPv6 TCP traffic inside this tunnel, and
checking GRO is building large packets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a new include file (include/net/gro_cells.h), to bring GRO
(Generic Receive Offload) capability to tunnels, in a modular way.
Because tunnels receive path is lockless, and GRO adds a serialization
using a napi_struct, I chose to add an array of up to
DEFAULT_MAX_NUM_RSS_QUEUES cells, so that multi queue devices wont be
slowed down because of GRO layer.
skb_get_rx_queue() is used as selector.
In the future, we might add optional fanout capabilities, using rxhash
for example.
With help from Ben Hutchings who reminded me
netif_get_num_default_rss_queues() function.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb with CHECKSUM_NONE cant currently be handled by GRO, and
we notice this deep in GRO stack in tcp[46]_gro_receive()
But there are cases where GRO can be a benefit, even with a lack
of checksums.
This preliminary work is needed to add GRO support
to tunnels.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), two routes
are added:
- one in the local table:
local 2002::1 via :: dev lo proto none metric 0
- one the in main table (for the prefix):
unreachable 2002::1 dev lo proto kernel metric 256 error -101
When the address is deleted, the route inserted in the main table remains
because we use rt6_lookup(), which returns NULL when dst->error is set, which
is the case here! Thus, it is better to use ip6_route_lookup() to avoid this
kind of filter.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ec47ea824774(skb: Add inline helper for getting the skb end offset from
head) introduces this helper function, skb_end_offset(),
we should make use of it.
Signed-off-by: Weiping Pan <wpan@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using list_move_tail() instead of list_del() + list_add_tail().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Sage Weil <sage@inktank.com>
Make ceph_monc_do_poolop() static to remove the following sparse warning:
* net/ceph/mon_client.c:616:5: warning: symbol 'ceph_monc_do_poolop' was not
declared. Should it be static?
Also drops the 'ceph_monc_' prefix, now being a private function.
Signed-off-by: Iulius Curt <icurt@ixiacom.com>
Signed-off-by: Sage Weil <sage@inktank.com>
As we skipped the merge window for 3.6-rc1 for the tty tree, everything
is now settled down and working properly, so we are ready for 3.7-rc1.
Here's the patchset, it's big, but the large changes are removing a
firmware file and adding a staging tty driver (it depended on the tty
core changes, so it's going through this tree instead of the staging
tree.)
All of these patches have been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlBp36oACgkQMUfUDdst+yk4WgCdEy13hot8fI2Lqnc7W0LKu7GX
4p8AoLTjzrXhLosxdijskDQ9X1OtjrxU
=S5Ng
-----END PGP SIGNATURE-----
Merge tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull TTY changes from Greg Kroah-Hartman:
"As we skipped the merge window for 3.6-rc1 for the tty tree,
everything is now settled down and working properly, so we are ready
for 3.7-rc1. Here's the patchset, it's big, but the large changes are
removing a firmware file and adding a staging tty driver (it depended
on the tty core changes, so it's going through this tree instead of
the staging tree.)
All of these patches have been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fix up more-or-less trivial conflicts in
- drivers/char/pcmcia/synclink_cs.c:
tty NULL dereference fix vs tty_port_cts_enabled() helper function
- drivers/staging/{Kconfig,Makefile}:
add-add conflict (dgrp driver added close to other staging drivers)
- drivers/staging/ipack/devices/ipoctal.c:
"split ipoctal_channel from iopctal" vs "TTY: use tty_port_register_device"
* tag 'tty-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (235 commits)
tty/serial: Add kgdb_nmi driver
tty/serial/amba-pl011: Quiesce interrupts in poll_get_char
tty/serial/amba-pl011: Implement poll_init callback
tty/serial/core: Introduce poll_init callback
kdb: Turn KGDB_KDB=n stubs into static inlines
kdb: Implement disable_nmi command
kernel/debug: Mask KGDB NMI upon entry
serial: pl011: handle corruption at high clock speeds
serial: sccnxp: Make 'default' choice in switch last
serial: sccnxp: Remove mask termios caps for SW flow control
serial: sccnxp: Report actual baudrate back to core
serial: samsung: Add poll_get_char & poll_put_char
Powerpc 8xx CPM_UART setting MAXIDL register proportionaly to baud rate
Powerpc 8xx CPM_UART maxidl should not depend on fifo size
Powerpc 8xx CPM_UART too many interrupts
Powerpc 8xx CPM_UART desynchronisation
serial: set correct baud_base for EXSYS EX-41092 Dual 16950
serial: omap: fix the reciever line error case
8250: blacklist Winbond CIR port
8250_pnp: do pnp probe before legacy probe
...
Here is the big driver core update for 3.7-rc1.
A number of firmware_class.c updates (as you saw a month or so ago), and
some hyper-v updates and some printk fixes as well. All patches that
are outside of the drivers/base area have been acked by the respective
maintainers, and have all been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlBp3vkACgkQMUfUDdst+ylQoACgldktGFgkCLzH+rGYthrXOC5P
9hUAnjmOhdoHlMTL81vWTlH+BrGernym
=khrr
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core merge from Greg Kroah-Hartman:
"Here is the big driver core update for 3.7-rc1.
A number of firmware_class.c updates (as you saw a month or so ago),
and some hyper-v updates and some printk fixes as well. All patches
that are outside of the drivers/base area have been acked by the
respective maintainers, and have all been in the linux-next tree for a
while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'driver-core-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits)
memory: tegra{20,30}-mc: Fix reading incorrect register in mc_readl()
device.h: Add missing inline to #ifndef CONFIG_PRINTK dev_vprintk_emit
memory: emif: Add ifdef CONFIG_DEBUG_FS guard for emif_debugfs_[init|exit]
Documentation: Fixes some translation error in Documentation/zh_CN/gpio.txt
Documentation: Remove 3 byte redundant code at the head of the Documentation/zh_CN/arm/booting
Documentation: Chinese translation of Documentation/video4linux/omap3isp.txt
device and dynamic_debug: Use dev_vprintk_emit and dev_printk_emit
dev: Add dev_vprintk_emit and dev_printk_emit
netdev_printk/netif_printk: Remove a superfluous logging colon
netdev_printk/dynamic_netdev_dbg: Directly call printk_emit
dev_dbg/dynamic_debug: Update to use printk_emit, optimize stack
driver-core: Shut up dev_dbg_reatelimited() without DEBUG
tools/hv: Parse /etc/os-release
tools/hv: Check for read/write errors
tools/hv: Fix exit() error code
tools/hv: Fix file handle leak
Tools: hv: Implement the KVP verb - KVP_OP_GET_IP_INFO
Tools: hv: Rename the function kvp_get_ip_address()
Tools: hv: Implement the KVP verb - KVP_OP_SET_IP_INFO
Tools: hv: Add an example script to configure an interface
...
Pull the trivial tree from Jiri Kosina:
"Tiny usual fixes all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
doc: fix old config name of kprobetrace
fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
btrfs: fix the commment for the action flags in delayed-ref.h
btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
vfs: fix kerneldoc for generic_fh_to_parent()
treewide: fix comment/printk/variable typos
ipr: fix small coding style issues
doc: fix broken utf8 encoding
nfs: comment fix
platform/x86: fix asus_laptop.wled_type module parameter
mfd: printk/comment fixes
doc: getdelays.c: remember to close() socket on error in create_nl_socket()
doc: aliasing-test: close fd on write error
mmc: fix comment typos
dma: fix comments
spi: fix comment/printk typos in spi
Coccinelle: fix typo in memdup_user.cocci
tmiofb: missing NULL pointer checks
tools: perf: Fix typo in tools/perf
tools/testing: fix comment / output typos
...
John W. Linville says:
====================
Here is another batch of updates intended for 3.7...
Highlights include an hci_connect re-write in Bluetooth, HCI/LLC
layer separation in NFC, removal of the raw pn544 NFC driver, NFC LLCP
raw sockets support, improved IBSS auth frame handling in mac80211,
full-MAC AP mode notification support in mac80211, a lot of attention
paid to brcmfmac, and the usual level of updates to iwlwifi, ath9k,
mwifiex, and rt2x00, and various other updates.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We shouldn't need more than 1 worker thread per cpu, since rpciod
is designed to run without sleeping in most cases.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fib6_add_1() should consistently return errno pointers,
rather than a mixture of NULL and errno pointers.
Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is only set after everyone has dereferenced the transport,
and serves no useful purpose: setting it is racy, so all the
socket code, etc still needs to be able to cope with the cases
where they miss reading it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We only have to call xdr_shrink_pagelen() if the remaining RPC
message does not fit in the page buffer length that we supplied
to xdr_align_pages().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently use percpu order-0 pages in __netdev_alloc_frag
to deliver fragments used by __netdev_alloc_skb()
Depending on NIC driver and arch being 32 or 64 bit, it allows a page to
be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096
Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows :
- Better filling of space (the ending hole overhead is less an issue)
- Less calls to page allocator or accesses to page->_count
- Could allow struct skb_shared_info futures changes without major
performance impact.
This patch implements a transparent fallback to smaller
pages in case of memory pressure.
It also uses a standard "struct page_frag" instead of a custom one.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When jiffies wraps around (for example, 5 minutes after the boot, see
INITIAL_JIFFIES) and peer has just been created, now - peer->rate_last can be
< XRLIM_BURST_FACTOR * timeout, so token is not set to the maximum value, thus
some icmp packets can be unexpectedly dropped.
Fix this case by initializing last_rate to 60 seconds in the past.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct sock *sk is not used inside tcp_v4_save_options. Thus it can be
removed.
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_parse_header() callers provide 8 bytes of storage,
so it's not possible to store an IPv6 address.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems sk_init() has no value today and even does strange things :
# grep . /proc/sys/net/core/?mem_*
/proc/sys/net/core/rmem_default:212992
/proc/sys/net/core/rmem_max:131071
/proc/sys/net/core/wmem_default:212992
/proc/sys/net/core/wmem_max:131071
We can remove it completely.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GCC refuses to recognize that all error control flows do in fact
set err to something.
Add an explicit initialization to shut it up.
net/sched/sch_drr.c: In function ‘drr_enqueue’:
net/sched/sch_drr.c:359:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/sched/sch_qfq.c: In function ‘qfq_enqueue’:
net/sched/sch_qfq.c:885:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: David S. Miller <davem@davemloft.net>
fix copy-paste error introduced in linux-next commit
"ipv6: add a new namespace for nf_conntrack_reasm"
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux tunnels were written before RFC6040 and therefore never
implemented the corner case of ECN getting set in the outer header
and the inner header not being ready for it.
Section 4.2. Default Tunnel Egress Behaviour.
o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
propagate any other ECN codepoint onwards. This is because the
inner Not-ECT marking is set by transports that rely on dropped
packets as an indication of congestion and would not understand or
respond to any other ECN codepoint [RFC4774]. Specifically:
* If the inner ECN field is Not-ECT and the outer ECN field is
CE, the decapsulator MUST drop the packet.
* If the inner ECN field is Not-ECT and the outer ECN field is
Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
outgoing packet with the ECN field cleared to Not-ECT.
This patch moves the ECN decap logic out of the individual tunnels
into a common place.
It also adds logging to allow detecting broken systems that
set ECN bits incorrectly when tunneling (or an intermediate
router might be changing the header).
Overloads rx_frame_error to keep track of ECN related error.
Thanks to Chris Wright who caught this while reviewing the new VXLAN
tunnel.
This code was tested by injecting faulty logic in other end GRE
to send incorrectly encapsulated packets.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The handlers for xfrm_tunnel are always invoked with rcu read lock
already.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The gre function pointers for receive and error handling are
always called (from gre.c) with rcu_read_lock already held.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GRE driver incorrectly uses zero as a flag value. Zero is a perfectly
valid value for key, and the tunnel should match packets with no key only
with tunnels created without key, and vice versa.
This is a slightly visible change since previously it might be possible to
construct a working tunnel that sent key 0 and received only because
of the key wildcard of zero. I.e the sender sent key of zero, but tunnel
was defined without key.
Note: using gre key 0 requires iproute2 utilities v3.2 or later.
The original utility code was broken as well.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function genlmsg_put() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check should
be replaced with NULL test.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
If time allows, I'd appreciate if you can take the following fix
for the xt_limit match.
As Jan indicates, random things may occur while using the xt_limit
match due to use of uninitialized memory.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
netlink_register_notifier requires notify functions to not sleep.
nfc_stop_poll locks device mutex and must not be called from notifier.
Create workqueue that will handle this for all devices.
BUG: sleeping function called from invalid context at kernel/mutex.c:269
in_atomic(): 0, irqs_disabled(): 0, pid: 4497, name: neard
1 lock held by neard/4497:
Pid: 4497, comm: neard Not tainted 3.5.0-999-nfc+ #5
Call Trace:
[<ffffffff810952c5>] __might_sleep+0x145/0x200
[<ffffffff81743dde>] mutex_lock_nested+0x2e/0x50
[<ffffffff816ffd19>] nfc_stop_poll+0x39/0xb0
[<ffffffff81700a17>] nfc_genl_rcv_nl_event+0x77/0xc0
[<ffffffff8174aa8c>] notifier_call_chain+0x5c/0x120
[<ffffffff8174abd6>] __atomic_notifier_call_chain+0x86/0x140
[<ffffffff8174ab50>] ? notifier_call_chain+0x120/0x120
[<ffffffff815e1347>] ? skb_dequeue+0x67/0x90
[<ffffffff8174aca6>] atomic_notifier_call_chain+0x16/0x20
[<ffffffff8162119a>] netlink_release+0x24a/0x280
[<ffffffff815d7aa8>] sock_release+0x28/0xa0
[<ffffffff815d7be7>] sock_close+0x17/0x30
[<ffffffff811b2a7c>] __fput+0xcc/0x250
[<ffffffff811b2c0e>] ____fput+0xe/0x10
[<ffffffff81085009>] task_work_run+0x69/0x90
[<ffffffff8101b951>] do_notify_resume+0x81/0xd0
[<ffffffff8174ef22>] int_signal+0x12/0x17
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is used when CONFIG_NFC_SHDLC is disabled.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This adds support for socket of type SOCK_RAW to LLCP.
sk_buff are copied and sent to raw sockets with a 2 bytes extra header:
The first byte header contains the nfc adapter index.
The second one contains flags:
- 0x01 - Direction (0=RX, 1=TX)
- 0x02-0x80 - Reserved
A raw socket has to be explicitly bound to a nfc adapter. This is achieved
by specifying the adapter index to be bound to in the dev_idx field of the
sockaddr_nfc_llcp struct passed to bind().
Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
iterates through the opened files in given descriptor table,
calling a supplied function; we stop once non-zero is returned.
Callback gets struct file *, descriptor number and const void *
argument passed to iterator. It is called with files->file_lock
held, so it is not allowed to block.
tty_io, netprio_cgroup and selinux flush_unauthorized_files()
converted to its use.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Both modular callers of sock_map_fd() had been buggy; sctp one leaks
descriptor and file if copy_to_user() fails, 9p one shouldn't be
exposing file in the descriptor table at all.
Switch both to sock_alloc_file(), export it, unexport sock_map_fd() and
make it static.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The callers of xdr_align_pages() expect it to return the number of bytes
of actual XDR data remaining in the pages.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Commit v2.6.19-rc1~1272^2~41 tells us that r->cost != 0 can happen when
a running state is saved to userspace and then reinstated from there.
Make sure that private xt_limit area is initialized with correct values.
Otherwise, random matchings due to use of uninitialized memory.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull more networking fixes from David Miller:
1) Eric Dumazet discovered and fixed what turned out to be a family of
bugs. These functions were using pskb_may_pull() which might need
to reallocate the linear SKB data buffer, but the callers were not
expecting this possibility. The callers have cached pointers to the
packet header areas, and would need to reload them if we were to
continue using pskb_may_pull().
So they could end up reading garbage.
It's easier to just change these RAW4/RAW6/MIP6 routines to use
skb_header_pointer() instead of pskb_may_pull(), which won't modify
the linear SKB data area.
2) Dave Jone's syscall spammer caught a case where a non-TCP socket can
call down into the TCP keepalive code. The case basically involves
creating a raw socket with sk_protocol == IPPROTO_TCP, then calling
setsockopt(sock_fd, SO_KEEPALIVE, ...)
Fixed by Eric Dumazet.
3) Bluetooth devices do not get configured properly while being powered
on, resulting in always using legacy pairing instead of SSP. Fix
from Andrzej Kaczmarek.
4) Bluetooth cancels delayed work erroneously, put stricter checks in
place. From Andrei Emeltchenko.
5) Fix deadlock between cfg80211_mutex and reg_regdb_search_mutex in
cfg80211, from Luis R. Rodriguez.
6) Fix interrupt double release in iwlwifi, from Emmanuel Grumbach.
7) Missing module license in bcm87xx driver, from Peter Huewe.
8) Team driver can lose port changed events when adding devices to a
team, fix from Jiri Pirko.
9) Fix endless loop when trying ot unregister PPPOE device in zombie
state, from Xiaodong Xu.
10) batman-adv layer needs to set MAC address of software device
earlier, otherwise we call tt_local_add with it uninitialized.
11) Fix handling of KSZ8021 PHYs, it's matched currently by KS8051 but
that doesn't program the device properly. From Marek Vasut.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
ipv6: mip6: fix mip6_mh_filter()
ipv6: raw: fix icmpv6_filter()
net: guard tcp_set_keepalive() to tcp sockets
phy/micrel: Add missing header to micrel_phy.h
phy/micrel: Rename KS80xx to KSZ80xx
phy/micrel: Implement support for KSZ8021
batman-adv: Fix symmetry check / route flapping in multi interface setups
batman-adv: Fix change mac address of soft iface.
pppoe: drop PPPOX_ZOMBIEs in pppoe_release
team: send port changed when added
ipv4: raw: fix icmp_filter()
net/phy/bcm87xx: Add MODULE_LICENSE("GPL") to GPL driver
iwlwifi: don't double free the interrupt in failure path
cfg80211: fix possible circular lock on reg_regdb_search()
Bluetooth: Fix not removing power_off delayed work
Bluetooth: Fix freeing uninitialized delayed works
Bluetooth: mgmt: Fix enabling LE while powered off
Bluetooth: mgmt: Fix enabling SSP while powered off
mip6_mh_filter() should not modify its input, or else its caller
would need to recompute ipv6_hdr() if skb->head is reallocated.
Use skb_header_pointer() instead of pskb_may_pull()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix the behaviour of batman-adv in case of virtual interface MAC change event
- fix symmetric link check in neighbour selection
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlBffHkACgkQpGgxIkP9cweh4gCfRow8tAL8CnrzFV7cAyTXrZ3K
sGkAoIOVe1hbuv4kfAh3eLz1kbd28y5n
=1xhN
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included fixes:
- fix the behaviour of batman-adv in case of virtual interface MAC change event
- fix symmetric link check in neighbour selection
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 5e953778a2 ("ipconfig: add nameserver
IPs to kernel-parameter ip=") introduces ic_nameservers_predef() that defined
only for BOOTP. However it is used by ip_auto_config_setup() as well. This
patch moves it outside of #ifdef BOOTP.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Christoph Fritz <chf.fritz@googlemail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
icmpv6_filter() should not modify its input, or else its caller
would need to recompute ipv6_hdr() if skb->head is reallocated.
Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.
Also, if icmpv6 header cannot be found, do not deliver the packet,
as we do in IPv4.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current regulatory code on cfg80211 performs a check to
see if a regulatory rule belongs to an IEEE band so that if
a Country IE is received and no rules are specified for a
band (which is allowed by IEEE) those bands are left intact.
The current band check assumes a rule is bound to a band
if the rule's start or end frequency is less than 2 GHz
apart from the center of frequency being inspected.
In order to support 60 GHz for 802.11ad we need to increase
this to account for the channel spacing of 2160 MHz whereby
a channel somewhere in the middle of a regulatory rule may
be more than 2 GHz apart from either the beginning or
end of the frequency rule.
Without a fix for this even though channels 1-3 are allowed world
wide on the rule (57240 - 63720 @ 2160), channel 2 at 60480 MHz
will end up getting disabled given that it is 3240 MHz from
both the frequency rule start and end frequency. Fix this by
using 2 GHz separation assumption for the 2.4 and 5 GHz bands
but for 60 GHz use a 10 GHz separation before assuming a rule
is not part of the band.
Since we have no 802.11ad drivers yet merged this change has
no impact to existing Linux upstream device drivers.
Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 5640f76858 ("net: use a per task frag allocator")
accidentally contained an unrelated change to net/ipv4/raw.c,
later committed (without the pr_err() debugging bits) in
net tree as commit ab43ed8b74 (ipv4: raw: fix icmp_filter())
This patch reverts this glitch, noticed by Stephen Rothwell.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Please pull this last(?) batch of fixes intended for 3.6...
For the Bluetooth bits, Gustavo says this:
"Here goes probably my last update to 3.6. It includes the two patches
you were ok last week(from Andrzej Kaczmarek), those are critical
ones, and two other fixes one for a system crash and the other for
a missing lockdep annotation."
The referenced fixes from Andrzej prevent attempts to configure devices
that are powered-off.
Along with the Bluetooth fixes, there are a couple of 802.11 fixes.
Emmanuel Grumbach gives us an iwlwifi fix to prevent releasing an
interrupt twice. Luis R. Rodriguez provides a fix for a possible
circular lock dependency in the cfg80211 regulatory enforcement code.
All of these have been in linux-next for a few days. I hope they are
not too late to make the 3.6 release!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull two ceph fixes from Sage Weil:
"The first fixes a leak in the rbd setup error path, and the second
fixes a more serious problem with mismatched kmap/kunmap that surfaced
after the recent refactoring work."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: only kunmap kmapped pages
rbd: drop dev reference on error in rbd_open()
During processing incoming RSET frame chip, possibly due to
its internal timout, can retrnasmit an another RSET which
is next queued for processing in shdlc layer.
In case when we accept processed RSET skip those remaining on
the rcv queue until chip will send it's first S or I frame.
This will mean the chip completed connection as well.
Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz@tieto.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
As queue_work() does not guarantee immediate execution of sm_work it
can happen in crossover RSET usecase that connect timer will constantly
change the shdlc state from NEGOTIATING to CONNECTING before shdlc has
chance to handle incoming frame.
Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz@tieto.com>
Acked-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The previous shdlc HCI driver and its header are removed from the tree.
PN544 now registers directly with HCI and passes the name of the llc it
requires (shdlc).
HCI instantiation now allocates the required llc instance. The llc is
started when the HCI device is brought up.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is used by HCI drivers such as the one for the pn544 which require
communications between HCI and the chip to use shdlc.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This is a passthrough llc. It can be used by HCI drivers that don't
need link layer control. HCI will then write directly to the driver, and
driver will deliver incoming frames directly to HCI without any
processing.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The LLC layer manages modules that control the link layer protocol (such
as shdlc) between HCI and an HCI driver. The driver must simply specify
the required llc when it registers with HCI.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This enables the completion callback to be called from a different
context, preventing a possible deadlock if the callback resulted in the
invocation of a nested call to the currently locked nfc_dev.
This is also more in line with the im_transceive nfc_ops for NFC Core or
NCI drivers which already behave asynchronously.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This method initiates execution of an HCI cmd. Result will be delivered
through an asynchronous callback.
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Make it match the data_exchange_cb_t so that it can be used directly in
the implementation of an asynchronous hci_transceive
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>