Commit Graph

45751 Commits

Author SHA1 Message Date
Aneesh Kumar K.V
abfa034e4b fs/9p: Update zero-copy implementation in 9p
* remove lot of update to different data structure
* add a seperate callback for zero copy request.
* above makes non zero copy code path simpler
* remove conditionalizing TREAD/TREADDIR/TWRITE in the zero copy path
* Fix the dotu p9_check_errors with zero copy. Add sufficient doc around
* Add support for both in and output buffers in zero copy callback
* pin and unpin pages in the same context
* use helpers instead of defining page offset and rest of page ourself
* Fix mem leak in p9_check_errors
* Remove 'E' and 'F' in p9pdu_vwritef

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-10-24 11:13:11 -05:00
Eric Dumazet
66b13d99d9 ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
There is a long standing bug in linux tcp stack, about ACK messages sent
on behalf of TIME_WAIT sockets.

In the IP header of the ACK message, we choose to reflect TOS field of
incoming message, and this might break some setups.

Example of things that were broken :
  - Routing using TOS as a selector
  - Firewalls
  - Trafic classification / shaping

We now remember in timewait structure the inet tos field and use it in
ACK generation, and route lookup.

Notes :
 - We still reflect incoming TOS in RST messages.
 - We could extend MuraliRaja Muniraju patch to report TOS value in
netlink messages for TIME_WAIT sockets.
 - A patch is needed for IPv6

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 03:06:21 -04:00
Richard Cochran
da92b194cc net: hold sock reference while processing tx timestamps
The pair of functions,

 * skb_clone_tx_timestamp()
 * skb_complete_tx_timestamp()

were designed to allow timestamping in PHY devices. The first
function, called during the MAC driver's hard_xmit method, identifies
PTP protocol packets, clones them, and gives them to the PHY device
driver. The PHY driver may hold onto the packet and deliver it at a
later time using the second function, which adds the packet to the
socket's error queue.

As pointed out by Johannes, nothing prevents the socket from
disappearing while the cloned packet is sitting in the PHY driver
awaiting a timestamp. This patch fixes the issue by taking a reference
on the socket for each such packet. In addition, the comments
regarding the usage of these function are expanded to highlight the
rule that PHY drivers must use skb_complete_tx_timestamp() to release
the packet, in order to release the socket reference, too.

These functions first appeared in v2.6.36.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@vger.kernel.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:54:50 -04:00
Eric Dumazet
318cf7aaa0 tcp: md5: add more const attributes
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more
const attributes to callers.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:46:04 -04:00
Rick Jones
8f9f4668b3 Add ethtool -g support to virtio_net
Add support for reporting ring sizes via ethtool -g to the virtio_net
driver.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:07:21 -04:00
Eric Dumazet
ca35a0ef85 tcp: md5: dont write skb head in tcp_md5_hash_header()
tcp_md5_hash_header() writes into skb header a temporary zero value,
this might confuse other users of this area.

Since tcphdr is small (20 bytes), copy it in a temporary variable and
make the change in the copy.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 01:52:35 -04:00
Nicholas Bellinger
2e982ab92d target: Remove legacy se_task->task_timer and associated logic
This patch removes the legacy usage of se_task->task_timer and associated
infrastructure that originally was used as a way to help manage buggy backend
SCSI LLDs that in certain cases would never return back an outstanding task.

This includes the removal of target_complete_timeout_work(), timeout logic
from transport_complete_task(), transport_task_timeout_handler(),
transport_start_task_timer(), the per device task_timeout configfs attribute,
and all task_timeout associated structure members and defines in
target_core_base.h

This is being removed in preparation to make transport_complete_task() run
in lock-less mode.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:22:08 +00:00
Nicholas Bellinger
415a090ade target: Fix incorrect transport_sent usage
This patch converts target-core to use se_cmd->t_transport_sent instead of
a duplicated se_cmd->transport_sent member in a handful of locations.
It also updates iscsi_target to properly use ->t_transport_sent instead of
it's own iscsi_cmd_t->transport_sent value that was not being assigned.

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:22:06 +00:00
Christoph Hellwig
7c1c6af37a target: remove the task_sg_bidi field se_task and pSCSI BIDI support
This field is never used given that BIDI handling happens at the
command and not the task level.  Remove it and the dead code in
pscsi that tries to work on it.

It also prevents pSCSI passthrough for the two currently enabled BIDI
commands now that task->task_sg_bidi support has been removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:56 +00:00
Nicholas Bellinger
dbc5623eb2 target: transport_subsystem_check_init cleanups
Remove the now unnecessary extra call to transport_subsystem_check_init() in
target_core_register_fabric(), and also merge transport_subsystem_reqmods()
directly into transport_subsystem_check_init().

Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:54 +00:00
Christoph Hellwig
35e0e75753 target: use a workqueue for I/O completions
Instead of abusing the target processing thread for offloading I/O
completion in the backends to user context add a new workqueue.  This means
completions can be processed as fast as available CPU time allows it,
including in parallel with other completions and more importantly I/O
submission or QUEUE FULL retries.  This should give much better performance
especially on loaded systems.

As a fallout we can merge all the completed states into a single
one.

On the downside this change complicates lun reset handling a bit by
requiring us to cancel a work item only for those states that have it
initialized.  The alternative would be to either always initialize the work
item to a dummy handler, or always use the same handler and do a switch on
the state. The long term solution will be a flag that says that the command
has an initialized work item, but that's only going to be useful once we
have more users.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:52 +00:00
Christoph Hellwig
59aaad1ec4 target: remove unused TRANSPORT_ states
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:51 +00:00
Christoph Hellwig
f2da9dbdb5 target: remove TRANSPORT_DEFERRED_CMD state
We never check for this state, and it makes testing for a completed
state much harder given that it overrides the existing state.

Also remove the unused deferred_t_state which is related to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:49 +00:00
Christoph Hellwig
bfaf40ada2 target: remove the TRANSPORT_REMOVE state
We never queue an command with this state, and only set it in a completely
bogus place in tcm_fc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:47 +00:00
Christoph Hellwig
cc5d0f0f61 target: stop task timers earlier
Currently we stop the timers for all tasks in a command fairly late during
I/O completion, which is fairly pointless and requires all kinds of safety
checks.

Instead delete pending timers early on in transport_complete_task, thus
ensuring no new timers firest after that.  We take t_state_lock a bit later
in that function thus making sure currenly running timers are out of the
criticial section.  To be completely sure the timer has finished we also
add another del_timer_sync call when freeing the task.

This also allows removing TF_TIMER_RUNNING as it would be equivalent
to TF_ACTIVE now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:44 +00:00
Christoph Hellwig
e99d48a62b target: remove TF_TIMER_STOP
TF_TIMER_STOP is useless as it only helps to mitigate a tiny race during
deleting the timer.  But given that we have cleared TF_ACTIVE at this point
we already have another mitigation a few lines down the function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:42 +00:00
Christoph Hellwig
cdbb70bb4c target: factor some duplicate code for stopping a task
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:41 +00:00
Christoph Hellwig
f7a5cc0b31 target: remove SCF_EMULATE_QUEUE_FULL
Add a new boolean at_head parameter to transport_add_cmd_to_queue and thus
obsolete the SCF_EMULATE_QUEUE_FULL flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:34 +00:00
Christoph Hellwig
e057f53308 target: remove the transport_qf_callback se_cmd callback
Remove the need for the transport_qf_callback callback by making
sure we have specific states with specific handlers for the two
queue full cases.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:32 +00:00
Christoph Hellwig
f55918fa32 target: clean up the backend interface to caching parameters
Remove the dpo_emulated, fua_write_emulated, fua_read_emulated and
write_cache_emulated methods, and replace them with a simple bitfields in
se_subsystem_api in those cases where they ever returned one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:31 +00:00
Christoph Hellwig
b937d27052 target: remove the ->transport_split_cdb callback in se_cmd
Add a switch statement implementing the CDB LBA/len update directly
in target_get_task_cdb and remove the old ->transport_split_cdb
callback and all its implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:15 +00:00
Christoph Hellwig
485fd0d1e3 target: replace ->get_cdb with a target_get_task_cdb helper
Instead of calling out to the backends from the core to get a per-task
CDB and then modify it for the LBA/len pair used for this CDB provide
a helper that writes the adjusted CDB into a provided buffer and call
this method from ->do_task in pscsi.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:13 +00:00
Christoph Hellwig
3189b067ee target: pack struct se_task more tightly
Rearrange the fields in se_task to avoid holes.  Also increase the
flags field to 16 bits as we have the space for it, and this makes
adding new flags safer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:10 +00:00
Christoph Hellwig
04629b7bde target: Remove unnecessary se_task members
This is a squashed version of the following unnecessary se_task structure
member removal patches:

target: remove the task_execute_queue field in se_task

    Instead of using a separate flag we can simply do list_emptry checks
    on t_execute_list if we make sure to always use list_del_init to remove
    a task from the list.  Also factor some duplicate code into a new
    __transport_remove_task_from_execute_queue helper.

target: remove the read-only task_no field in se_task

    The task_no field never was initialized and only used in debug printks,
    so kill it.

target: remove the task_padded_sg field in se_task

    This field is only check in one place and not actually needed there.

    Rationale:
    - transport_do_task_sg_chain asserts that we have task_sg_chaining
      set early on
    - we only make use of the sg_prev_nents field we calculate based on it
      if there is another sg list that gets chained onto this one, which
      never happens for the last (or only) task.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:08 +00:00
Christoph Hellwig
6c76bf951c target: make more use of the task_flags field in se_task
Replace various atomic_t variables that were mostly under t_state_lock
with new flags in task_flags.  Note that the execution error path
didn't take t_state_lock before, so add it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:06 +00:00
Christoph Hellwig
42bf829eee target: Cleanup unused se_task bits
This is a squashed version of the following se_task cleanup patches:

    target: remove the unused task_state_flags field in se_task
    target: remove the unused se_obj_ptr field in se_task
    target: remove the se_dev field in se_task

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:05 +00:00
Christoph Hellwig
c0427f1556 target: Cleanup unused target_core_base.h bits
This is a squashed version of the following target_core_base.h
cleanup patches:

    target: remove the unused SHUTDOWN_SIGS defintion
    target: remove unused se_mem leftovers
    target: remove the unused map_func_t typedef
    target: move TRANSPORT_IOV_DATA_BUFFER to the iscsi-specific code

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:21:03 +00:00
Nicholas Bellinger
8dc52b5420 target: Merge transport_cmd_finish_abort_tmr into transport_cmd_finish_abort
This patch merges transport_cmd_finish_abort_tmr() logic into a single
transport_cmd_finish_abort() function by adding a cmd->se_tmr_req check
around transport_lun_remove_cmd(), and updates the single caller within
core_tmr_drain_tmr_list().

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:20:58 +00:00
Nicholas Bellinger
d14921d6ad target: Convert ->transport_wait_for_tasks usage to transport_generic_free_cmd
This patch converts se_cmd->transport_wait_for_tasks(se_cmd, 1) usage to use
transport_generic_free_cmd() directly in target-core and iscsi-target fabric
usage.  The includes:

*) Removal of the optional transport_generic_free_cmd() call from within
   transport_generic_wait_for_tasks()
*) Usage of existing SCF_SUPPORTED_SAM_OPCODE to determine when
   transport_generic_wait_for_tasks() processing may occur instead of
   checking se_cmd->transport_wait_for_tasks()
*) Move transport_generic_wait_for_tasks() call ahead of core_dec_lacl_count()
   and transport_lun_remove_cmd() in transport_generic_free_cmd() to follow
   existing logic for iscsi-target w/ se_cmd->transport_wait_for_tasks(se_cmd, 1)
*) Removal of se_cmd->transport_wait_for_tasks() function pointer
*) Rename transport_generic_wait_for_tasks() -> transport_wait_for_tasks(), and
   add docbook comment.
*) Add EXPORT_SYMBOL for transport_wait_for_tasks()

For the case in iscsi_target_erl2.c:iscsit_prepare_cmds_for_realligance()
where se_cmd->transport_wait_for_tasks(se_cmd, 0) is called, this patch
adds a direct call to transport_wait_for_tasks().

(hch: Fix transport_generic_free_cmd() usage in iscsit_release_commands_from_conn)
(nab: Add patch: Ensure that TMRs hit wait_for_tasks logic during release)

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:20:54 +00:00
Roland Dreier
dd503a5fcc target: Have core_tmr_alloc_req() take an explicit GFP_xxx flag
Testing in_interrupt() to know when sleeping is allowed is not really
reliable (since eg it won't be true if the caller is holding a spinlock).
Instead have the caller tell core_tmr_alloc_req() what GFP_xxx to use;
every caller except tcm_qla2xxx can use GFP_KERNEL.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:20:53 +00:00
Christoph Hellwig
a3eedc227b target: remove unused se_subsystem_api methods
The cdb_none, map_data_SG and map_control_SG methods have no callers left
and can be removed now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:20:46 +00:00
Nicholas Bellinger
39c05f321a target: Remove session_reinstatement parameter from ->transport_wait_for_tasks
This patch removes the unnecessary session_reinstatement parameter from
se_cmd->transport_wait_for_tasks(), logic in transport_generic_wait_for_tasks,
and usage within iscsi-target code.

This also includes the removal of the 'bool' return from transport_put_cmd() +
transport_generic_free_cmd() that is no longer necessary.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:20:39 +00:00
Christoph Hellwig
82f1c8a4e7 target: push session reinstatement out of transport_generic_free_cmd
Push session reinstatement out of transport_generic_free_cmd into the only
caller that actually needs it.  Clean up transport_generic_free_cmd a bit,
and remove the useless comment.  I'd love to add a more useful kerneldoc
comment for it, but as this point I'm still a bit confused in where it
stands in the command release stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:20:38 +00:00
Christoph Hellwig
2dbc43d256 target: remove transport_free_se_cmd
It is only called by transport_release_cmd, so inline it there.  Also add
a kerneldoc comment for transport_release_cmd while we are at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:20:31 +00:00
Christoph Hellwig
680b73c5f2 target: remove transport_generic_handle_cdb
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-10-24 03:20:28 +00:00
Eric Dumazet
b5d9c9c281 inet: add rfc 3168 extract in front of INET_ECN_encapsulate()
INET_ECN_encapsulate() is better understood if we can read the official
statement.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-22 01:25:23 -04:00
Rafael J. Wysocki
d033e07856 Merge branch 'pm-domains' into pm-for-linus
* pm-domains:
  ARM: mach-shmobile: sh7372 A4R support (v4)
  ARM: mach-shmobile: sh7372 A3SP support (v4)
  PM / Sleep: Mark devices involved in wakeup signaling during suspend
2011-10-22 00:21:52 +02:00
Rafael J. Wysocki
4ca46ff3e0 PM / Sleep: Mark devices involved in wakeup signaling during suspend
The generic PM domains code in drivers/base/power/domain.c has
to avoid powering off domains that provide power to wakeup devices
during system suspend.  Currently, however, this only works for
wakeup devices directly belonging to the given domain and not for
their children (or the children of their children and so on).
Thus, if there's a wakeup device whose parent belongs to a power
domain handled by the generic PM domains code, the domain will be
powered off during system suspend preventing the device from
signaling wakeup.

To address this problem introduce a device flag, power.wakeup_path,
that will be set during system suspend for all wakeup devices,
their parents, the parents of their parents and so on.  This way,
all wakeup paths in the device hierarchy will be marked and the
generic PM domains code will only need to avoid powering off
domains containing devices whose power.wakeup_path is set.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-22 00:19:29 +02:00
Eric Dumazet
cf533ea53e tcp: add const qualifiers where possible
Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.

For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 05:22:42 -04:00
Eric W. Biederman
e09eff7fc1 macvtap: Fix the minor device number allocation
On systems that create and delete lots of dynamic devices the
31bit linux ifindex fails to fit in the 16bit macvtap minor,
resulting in unusable macvtap devices.  I have systems running
automated tests that that hit this condition in just a few days.

Use a linux idr allocator to track which mavtap minor numbers
are available and and to track the association between macvtap
minor numbers and macvtap network devices.

Remove the unnecessary unneccessary check to see if the network
device we have found is indeed a macvtap device.  With macvtap
specific data structures it is impossible to find any other
kind of networking device.

Increase the macvtap minor range from 65536 to the full 20 bits
that is supported by linux device numbers.  It doesn't solve the
original problem but there is no penalty for a larger minor
device range.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:53:07 -04:00
Ian Campbell
a8605c6063 net: add opaque struct around skb frag page
I've split this bit out of the skb frag destructor patch since it helps enforce
the use of the fragment API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:52:53 -04:00
Maciej Żenczykowski
6cc7a765c2 net: allow CAP_NET_RAW to set socket options IP{,V6}_TRANSPARENT
Up till now the IP{,V6}_TRANSPARENT socket options (which actually set
the same bit in the socket struct) have required CAP_NET_ADMIN
privileges to set or clear the option.

- we make clearing the bit not require any privileges.
- we allow CAP_NET_ADMIN to set the bit (as before this change)
- we allow CAP_NET_RAW to set this bit, because raw
  sockets already pretty much effectively allow you
  to emulate socket transparency.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 18:21:36 -04:00
Eric Dumazet
05bdd2f143 net: constify skbuff and Qdisc elements
Preliminary patch before tcp constification

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 17:45:43 -04:00
Stephen Warren
a5818a8bd0 pinctrl: get_group_pins() const fixes
get_group_pins() "returns" a pointer to an array of const objects, through
a pointer parameter. Fix the prototype so what's pointed at by the returned
pointer is const, rather than the function parameter being const.

This also allows the removal of a cast in each of the two current pinmux
drivers.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2011-10-20 11:41:49 +02:00
Ian Campbell
30d3c128ea mm: add a "struct page_frag" type containing a page, offset and length
A few network drivers currently use skb_frag_struct for this purpose but I have
patches which add additional fields and semantics there which these other uses
do not want.

A structure for reference sub-page regions seems like a generally useful thing
so do so instead of adding a network subsystem specific structure.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 04:58:32 -04:00
Ian Campbell
a0bec1cd8f net: do not take an additional reference in skb_frag_set_page
I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.

In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:

6a930b9f16 cxgb3: convert to SKB paged frag API.
5dc3e196ea myri10ge: convert to SKB paged frag API.
0e0634d20d vmxnet3: convert to SKB paged frag API.
86ee8130a4 virtionet: convert to SKB paged frag API.
4a22c4c919 sfc: convert to SKB paged frag API.
18324d690d cassini: convert to SKB paged frag API.
b061b39e3a benet: convert to SKB paged frag API.
b7b6a688d2 bnx2: convert to SKB paged frag API.
804cf14ea5 net: xfrm: convert to SKB frag APIs
ea2ab69379 net: convert core to skb paged frag APIs

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:40:39 -04:00
Dan Carpenter
4f25af2782 filter: use unsigned int to silence static checker warning
This is just a cleanup.

My testing version of Smatch warns about this:
net/core/filter.c +380 check_load_and_stores(6)
	warn: check 'flen' for negative values

flen comes from the user.  We try to clamp the values here between 1
and BPF_MAXINSNS but the clamp doesn't work because it could be
negative.  This is a bug, but it's not exploitable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:35:51 -04:00
Eric W. Biederman
672d82c18d class: Implement support for class attrs in tagged sysfs directories.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:24:15 -04:00
Eric W. Biederman
487505c257 sysfs: Implement support for tagged files in sysfs.
Looking up files in sysfs is hard to understand and analyize because we
currently allow placing untagged files in tagged directories.  In the
implementation of that we have two subtly different meanings of NULL.
NULL meaning there is no tag on a directory entry and NULL meaning
we don't care which namespace the lookup is performed for.  This
multiple uses of NULL have resulted in subtle bugs (since fixed)
in the code.

Currently it is only the bonding driver that needs to have an untagged
file in a tagged directory.

To untagle this mess I am adding support for tagged files to sysfs.
Modifying the bonding driver to implement bonding_masters as a tagged
file.  Registering bonding_masters once for each network namespace.
Then I am removing support for untagged entries in tagged sysfs
directories.

Resulting in code that is much easier to reason about.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:24:14 -04:00
Richard Cochran
4dc360c5e7 net: validate HWTSTAMP ioctl parameters
This patch adds a sanity check on the values provided by user space for
the hardware time stamping configuration. If the values lie outside of
the absolute limits, then the ioctl request will be denied.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 17:00:35 -04:00
Trond Myklebust
fba730050d NFS: Don't rely on PageError in nfs_readpage_release_partial
Don't rely on the PageError flag to tell us if one of the partial reads of
the page failed. Instead, replace that with a dedicated flag in the
struct nfs_page.

Then clean out redundant uses of the PageError flag: the VM no longer
checks it for reads.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:58:38 -07:00
Trond Myklebust
b8ef70639b NFS: Get rid of the unused nfs_write_data->flags field
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:37:34 -07:00
Trond Myklebust
a1940805d0 NFS: Get rid of the unused nfs_read_data->flags field
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:37:34 -07:00
Andy Fleming
3d153a7c8b net: Allow skb_recycle_check to be done in stages
skb_recycle_check resets the skb if it's eligible for recycling.
However, there are times when a driver might want to optionally
manipulate the skb data with the skb before resetting the skb,
but after it has determined eligibility.  We do this by splitting the
eligibility check from the skb reset, creating two inline functions to
accomplish that task.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 15:59:45 -04:00
J. Bruce Fields
8b289b2c23 nfsd4: implement new 4.1 open reclaim types
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-19 11:52:12 -04:00
Yevgeny Petrilin
f3a9d1f25d mlx4_en: Controlling FCS header removal
Canceling FCS removal where FW allows for better alignment
of incoming data.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:42:26 -04:00
Daniel Martensson
5ea2ef5f8b caif-hsi: Added recovery check of CA wake status.
Added recovery check of CA wake status in case of wake up timeout.
Added check of CA wake status in case of wake down timeout.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:25:43 -04:00
Daniel Martensson
5bbed92d3d caif-hsi: Added sanity check for length of CAIF frames
Added sanity check for length of CAIF frames, and tear down of
CAIF link-layer device upon protocol error.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:25:42 -04:00
Dmitry Tarnyagin
28bd204942 caif-hsi: Make inactivity timeout configurable.
CAIF HSI uses a timer for inactivity. Upon timeout HSI-wake signaling
is initiated to allow power-down of the HSI block.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:25:42 -04:00
Daniel Martensson
687b13e98a caif-hsi: Making read and writes asynchronous.
Some platforms do not allow to put HSI block into low-power
mode when FIFO is not empty. The patch flushes (by reading)
FIFO at wake down sequence. Asynchronous read and write is
implemented for that. As a side effect this will also greatly
improve performance.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:25:41 -04:00
Eric Dumazet
9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
Eric Dumazet
bc416d9768 macvlan: handle fragmented multicast frames
Fragmented multicast frames are delivered to a single macvlan port,
because ip defrag logic considers other samples are redundant.

Implement a defrag step before trying to send the multicast frame.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-18 23:22:07 -04:00
Trond Myklebust
0c2e53f11a NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup()
...and also remove the associated nfs_v4_clientops entry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 16:13:51 -07:00
Jiri Slaby
fa90e1c935 TTY: make tty_add_file non-failing
If tty_add_file fails at the point it is now, we have to revert all
the changes we did to the tty. It means either decrease all refcounts
if this was a tty reopen or delete the tty if it was newly allocated.

There was a try to fix this in v3.0-rc2 using tty_release in 0259894c7
(TTY: fix fail path in tty_open). But instead it introduced a NULL
dereference. It's because tty_release dereferences
filp->private_data, but that one is set even in our tty_add_file. And
when tty_add_file fails, it's still NULL/garbage. Hence tty_release
cannot be called there.

To circumvent the original leak (and the current NULL deref) we split
tty_add_file into two functions, making the latter non-failing. In
that case we may do the former early in open, where handling failures
is easy. The latter stays as it is now. So there is no change in
functionality.

The original bug (leak) was introduced by f573bd176 (tty: Remove
__GFP_NOFAIL from tty_add_file()). Thanks Dan for reporting this.

Later, we may split tty_release into more functions and call only some
of them in this fail path instead. (If at all possible.)

Introduced-in: v2.6.37-rc2
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable <stable@vger.kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-18 14:22:37 -07:00
Thomas Meyer
8193c42906 tty: Support compat_ioctl get/set termios_locked
When running a Fedora 15 (x86) on an x86_64 kernel, in the boot process
plymouthd complains about those two missing ioctls:
[    2.581783] ioctl32(plymouthd:186): Unknown cmd fd(10) cmd(00005457){t:'T';sz:0} arg(ffb6a5d0) on /dev/tty1
[    2.581803] ioctl32(plymouthd:186): Unknown cmd fd(10) cmd(00005456){t:'T';sz:0} arg(ffb6a680) on /dev/tty1

both ioctl functions work on the 'struct termios' resp. 'struct termios2',
which has the same size (36 bytes resp. 44 bytes) on x86 and x86_64,
so it's just a matter of converting the pointer from userland.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-18 14:17:11 -07:00
Jason Baron
07613b0b5e dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions
Replace the repetitive struct _ddebug descriptor definitions with a new
DECLARE_DYNAMIC_DEBUG_META_DATA(name, fmt) macro.

[akpm@linux-foundation.org: s/DECLARE/DEFINE/]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-18 11:22:00 -07:00
Kai Jiang
27a90700a4 uio: Support physical addresses >32 bits on 32-bit systems
To support >32-bit physical addresses for UIO_MEM_PHYS type we need to
extend the width of 'addr' in struct uio_mem.  Numerous platforms like
embedded PPC, ARM, and X86 have support for systems with larger physical
address than logical.

Since 'addr' may contain a physical, logical, or virtual address the
easiest solution is to just change the type to 'phys_addr_t' which
should always be greater than or equal to the sizeof(void *) such that
it can properly hold any of the address types.

For physical address we can support up to a 44-bit physical address on a
typical 32-bit system as we utilize remap_pfn_range() for the mapping of
the memory region and pfn's are represnted by shifting the address by
the page size (typically 4k).

Signed-off-by: Kai Jiang <Kai.Jiang@freescale.com>
Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Hans J. Koch <hjk@hansjkoch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-18 11:18:57 -07:00
Trond Myklebust
a9a4a87a59 NFS: Use the inode->i_version to cache NFSv4 change attribute information
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:14:34 -07:00
Trond Myklebust
d77385f238 SUNRPC: Fix rpc_sockaddr2uaddr
rpc_sockaddr2uaddr is only used by net/sunrpc/rpcb_clnt.c, where
it is used in a non-blockable context in at least one case.

Add non-blocking capability by adding a gfp_t argument

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:13:32 -07:00
Peng Tao
c1225158a8 SUNRPC/NFS: make rpc pipe upcall generic
The same function is used by idmap, gss and blocklayout code. Make it
generic.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:12 -07:00
David S. Miller
ae2a458315 Merge branch 'nf' of git://1984.lsi.us.es/net 2011-10-17 19:38:03 -04:00
Marc Kleine-Budde
f861c2b80c can: remove references to berlios mailinglist
The BerliOS project, which currently hosts our mailinglist, will
close with the end of the year. Now take the chance and remove all
occurrences of the mailinglist address from the source files.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-17 19:22:46 -04:00
Gerrit Renker
f36c23bb9f udplite: fast-path computation of checksum coverage
Commit 903ab86d19 of 1 March this year ("udp: Add
lockless transmit path") introduced a new fast TX path that broke the checksum
coverage computation of UDP-lite, which so far depended on up->len (only set
if the socket is locked and 0 in the fast path).

Fixed by providing both fast- and slow-path computation of checksum coverage.
The latter can be removed when UDP(-lite)v6 also uses a lockless transmit path.
 
Reported-by: Thomas Volkert <thomas@homer-conferencing.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-17 19:07:30 -04:00
John W. Linville
41ebe9cde7 Merge branch 'master' of git://git.infradead.org/users/linville/wireless-next into for-davem 2011-10-17 15:05:26 -04:00
Ian Campbell
9bab0b7fba genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlier
This adds a mechanism to resume selected IRQs during syscore_resume
instead of dpm_resume_noirq.

Under Xen we need to resume IRQs associated with IPIs early enough
that the resched IPI is unmasked and we can therefore schedule
ourselves out of the stop_machine where the suspend/resume takes
place.

This issue was introduced by 676dc3cf5b "xen: Use IRQF_FORCE_RESUME".

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk
Cc: stable@kernel.org (at least to 2.6.32.y)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-17 11:42:49 +02:00
Rafael J. Wysocki
2aede851dd PM / Hibernate: Freeze kernel threads after preallocating memory
There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers.  Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out.  Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation.  If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Unfortunately, this requires the memory preallocation to be done
before the "prepare" stage of device freeze, so after this change the
only way drivers can allocate additional memory for their freeze
routines in a clean way is to use PM notifiers.

Reported-by: Christoph <cr2005@u-club.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:28:52 +02:00
H Hartley Sweeten
37cce26b32 PM / VT: Cleanup #if defined uglyness and fix compile error
Introduce the config option CONFIG_VT_CONSOLE_SLEEP in order to cleanup
the #if defined ugliness for the vt suspend support functions. Note that
CONFIG_VT_CONSOLE is already dependant on CONFIG_VT.

The function pm_set_vt_switch is actually dependant on CONFIG_VT and not
CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is
not set:

drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch'
include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here

Also, remove the incorrect path from the comment in console.c.

[rjw: Replaced #if defined() with #ifdef in suspend.h.]

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:28:51 +02:00
Martin Schwidefsky
85055dd805 PM / Hibernate: Include storage keys in hibernation image on s390
For s390 there is one additional byte associated with each page,
the storage key. This byte contains the referenced and changed
bits and needs to be included into the hibernation image.
If the storage keys are not restored to their previous state all
original pages would appear to be dirty. This can cause
inconsistencies e.g. with read-only filesystems.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:27:46 +02:00
ShuoX Liu
2a77c46de1 PM / Suspend: Add statistics debugfs file for suspend to RAM
Record S3 failure time about each reason and the latest two failed
devices' names in S3 progress.
We can check it through 'suspend_stats' entry in debugfs.

The motivation of the patch:

We are enabling power features on Medfield. Comparing with PC/notebook,
a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far
more frequently. If it can't enter suspend-2-ram in time, the power
might be used up soon.

We often find sometimes, a device suspend fails. Then, system retries
s3 over and over again. As display is off, testers and developers
don't know what happens.

Some testers and developers complain they don't know if system
tries suspend-2-ram, and what device fails to suspend. They need
such info for a quick check. The patch adds suspend_stats under
debugfs for users to check suspend to RAM statistics quickly.

If not using this patch, we have other methods to get info about
what device fails. One is to turn on  CONFIG_PM_DEBUG, but users
would get too much info and testers need recompile the system.

In addition, dynamic debug is another good tool to dump debug info.
But it still doesn't match our utilization scenario closely.
1) user need write a user space parser to process the syslog output;
2) Our testing scenario is we leave the mobile for at least hours.
   Then, check its status. No serial console available during the
   testing. One is because console would be suspended, and the other
   is serial console connecting with spi or HSU devices would consume
   power. These devices are powered off at suspend-2-ram.

Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:27:45 +02:00
Greg Rose
5f8444a3fa if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
Add configuration setting for drivers to turn spoof checking on or off
for discrete VFs.

v2 - Fix indentation problem, wrap the ifla_vf_info structure in
     #ifdef __KERNEL__ to prevent user space from accessing and
     change function paramater for the spoof check setting netdev
     op from u8 to bool.
v3 - Preset spoof check setting to -1 so that user space tools such
     as ip can detect that the driver didn't report a spoofcheck
     setting.  Prevents incorrect display of spoof check settings
     for drivers that don't report it.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-10-16 13:15:38 -07:00
Helmut Schaa
bb6e753e95 nl80211: Add sta_flags to the station info
Reuse the already existing struct nl80211_sta_flag_update to specify
both, a flag mask and the flag set itself. This means
nl80211_sta_flag_update is now used for setting station flags and also
for getting station flags.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-10-14 14:48:23 -04:00
Helmut Schaa
7f2a5e214d mac80211: Populate radiotap header with MCS info for TX frames
mac80211 already filled in the MCS rate info for rx'ed frames but tx'ed
frames that are sent to a monitor interface during the status callback
lack this information.

Add the radiotap fields for MCS info to ieee80211_tx_status_rtap_hdr
and populate them when sending tx'ed frames to the monitors.

The needed headroom is only extended by one byte since we don't include
legacy rate information in the rtap header for HT frames.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-10-14 14:48:14 -04:00
Szymon Janc
88149db494 Bluetooth: rfcomm: Fix sleep in invalid context in rfcomm_security_cfm
This was triggered by turning off encryption on ACL link when rfcomm
was using high security. rfcomm_security_cfm (which is called from rx
task) was closing DLC and this involves sending disconnect message
(and locking socket).

Move closing DLC to rfcomm_process_dlcs and only flag DLC for closure
in rfcomm_security_cfm.

BUG: sleeping function called from invalid context at net/core/sock.c:2032
in_atomic(): 1, irqs_disabled(): 0, pid: 1788, name: kworker/0:3
[<c0068a08>] (unwind_backtrace+0x0/0x108) from [<c05e25dc>] (dump_stack+0x20/0x24)
[<c05e25dc>] (dump_stack+0x20/0x24) from [<c0087ba8>] (__might_sleep+0x110/0x12c)
[<c0087ba8>] (__might_sleep+0x110/0x12c) from [<c04801d8>] (lock_sock_nested+0x2c/0x64)
[<c04801d8>] (lock_sock_nested+0x2c/0x64) from [<c05670c8>] (l2cap_sock_sendmsg+0x58/0xcc)
[<c05670c8>] (l2cap_sock_sendmsg+0x58/0xcc) from [<c047cf6c>] (sock_sendmsg+0xb0/0xd0)
[<c047cf6c>] (sock_sendmsg+0xb0/0xd0) from [<c047cfc8>] (kernel_sendmsg+0x3c/0x44)
[<c047cfc8>] (kernel_sendmsg+0x3c/0x44) from [<c056b0e8>] (rfcomm_send_frame+0x50/0x58)
[<c056b0e8>] (rfcomm_send_frame+0x50/0x58) from [<c056b168>] (rfcomm_send_disc+0x78/0x80)
[<c056b168>] (rfcomm_send_disc+0x78/0x80) from [<c056b9f4>] (__rfcomm_dlc_close+0x2d0/0x2fc)
[<c056b9f4>] (__rfcomm_dlc_close+0x2d0/0x2fc) from [<c056bbac>] (rfcomm_security_cfm+0x140/0x1e0)
[<c056bbac>] (rfcomm_security_cfm+0x140/0x1e0) from [<c0555ec0>] (hci_event_packet+0x1ce8/0x4d84)
[<c0555ec0>] (hci_event_packet+0x1ce8/0x4d84) from [<c0550380>] (hci_rx_task+0x1d0/0x2d0)
[<c0550380>] (hci_rx_task+0x1d0/0x2d0) from [<c009ee04>] (tasklet_action+0x138/0x1e4)
[<c009ee04>] (tasklet_action+0x138/0x1e4) from [<c009f21c>] (__do_softirq+0xcc/0x274)
[<c009f21c>] (__do_softirq+0xcc/0x274) from [<c009f6c0>] (do_softirq+0x60/0x6c)
[<c009f6c0>] (do_softirq+0x60/0x6c) from [<c009f794>] (local_bh_enable_ip+0xc8/0xd4)
[<c009f794>] (local_bh_enable_ip+0xc8/0xd4) from [<c05e5804>] (_raw_spin_unlock_bh+0x48/0x4c)
[<c05e5804>] (_raw_spin_unlock_bh+0x48/0x4c) from [<c040d470>] (data_from_chip+0xf4/0xaec)
[<c040d470>] (data_from_chip+0xf4/0xaec) from [<c04136c0>] (send_skb_to_core+0x40/0x178)
[<c04136c0>] (send_skb_to_core+0x40/0x178) from [<c04139f4>] (cg2900_hu_receive+0x15c/0x2d0)
[<c04139f4>] (cg2900_hu_receive+0x15c/0x2d0) from [<c0414cb8>] (hci_uart_tty_receive+0x74/0xa0)
[<c0414cb8>] (hci_uart_tty_receive+0x74/0xa0) from [<c02cbd9c>] (flush_to_ldisc+0x188/0x198)
[<c02cbd9c>] (flush_to_ldisc+0x188/0x198) from [<c00b2774>] (process_one_work+0x144/0x4b8)
[<c00b2774>] (process_one_work+0x144/0x4b8) from [<c00b2e8c>] (worker_thread+0x198/0x468)
[<c00b2e8c>] (worker_thread+0x198/0x468) from [<c00b9bc8>] (kthread+0x98/0xa0)
[<c00b9bc8>] (kthread+0x98/0xa0) from [<c0061744>] (kernel_thread_exit+0x0/0x8)

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-10-14 15:04:54 -03:00
Boaz Harrosh
4b46c9f5cf ore/exofs: Change ore_check_io API
Current ore_check_io API receives a residual
pointer, to report partial IO. But it is actually
not used, because in a multiple devices IO there
is never a linearity in the IO failure.

On the other hand if every failing device is reported
through a received callback measures can be taken to
handle only failed devices. One at a time.

This will also be needed by the objects-layout-driver
for it's error reporting facility.

Exofs is not currently using the new information and
keeps the old behaviour of failing the complete IO in
case of an error. (No partial completion)

TODO: Use an ore_check_io callback to set_page_error only
the failing pages. And re-dirty write pages.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:54:42 +02:00
Boaz Harrosh
5a51c0c7e9 ore/exofs: Define new ore_verify_layout
All users of the ore will need to check if current code
supports the given layout. For example RAID5/6 is not
currently supported.

So move all the checks from exofs/super.c to a new
ore_verify_layout() to be used by ore users.

Note that any new layout should be passed through the
ore_verify_layout() because the ore engine will prepare
and verify some internal members of ore_layout, and
assumes it's called.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:54:41 +02:00
Boaz Harrosh
3bd9856857 ore: Support for partial component table
Users like the objlayout-driver would like to only pass
a partial device table that covers the IO in question.
For example exofs divides the file into raid-group-sized
chunks and only serves group_width number of devices at
a time.

The partiality is communicated by setting
ore_componets->first_dev and the array covers all logical
devices from oc->first_dev upto (oc->first_dev + oc->numdevs)

The ore_comp_dev() API receives a logical device index
and returns the actual present device in the table.
An out-of-range dev_index will BUG.

Logical device index is the theoretical device index as if
all the devices of a file are present. .i.e:
	total_devs = group_width * mirror_p1 * group_count
	0 <= dev_index < total_devs

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:54:41 +02:00
Boaz Harrosh
9826075404 ore: cleanup: Embed an ore_striping_info inside ore_io_state
Now that each ore_io_state covers only a single raid group.
A single striping_info math is needed. Embed one inside
ore_io_state to cache the calculation results and eliminate
an extra call.

Also the outer _prepare_for_striping is removed since it does nothing.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:53:54 +02:00
Eric Dumazet
87fb4b7b53 net: more accurate skb truesize
skb truesize currently accounts for sk_buff struct and part of skb head.
kmalloc() roundings are also ignored.

Considering that skb_shared_info is larger than sk_buff, its time to
take it into account for better memory accounting.

This patch introduces SKB_TRUESIZE(X) macro to centralize various
assumptions into a single place.

At skb alloc phase, we put skb_shared_info struct at the exact end of
skb head, to allow a better use of memory (lowering number of
reallocations), since kmalloc() gives us power-of-two memory blocks.

Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
aligned to cache lines, as before.

Note: This patch might trigger performance regressions because of
misconfigured protocol stacks, hitting per socket or global memory
limits that were previously not reached. But its a necessary step for a
more accurate memory accounting.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <ak@linux.intel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-13 16:05:07 -04:00
Neil Zhang
dde34cc501 usb: gadget: mv_udc: refine the driver structure
This patch do the following things:

1. Add header and Copyright for marvell usb driver.
2. Add mv_usb.h in include/linux/platform_data, make the driver
   fits all the marvell platform using the same ChipIdea usb ip.
3. Some SOC may has mutiple clock sources, so let me define it
   in mv_usb_platform_data and give two helper functions named
   udc_clock_enable/udc_clock_disable to deal with the clocks.
4. Different SOCs will have some difference in PHY initialization,
   so we will remove file mv_udc_phy.c and add two funtions in
   mv_usb_platform_data, let the platform relative driver to realize it.
5. Rewrite probe function according to the modification list above. Find
   it will kernel panic when probe failed. The root cause is as follows:
	When probe failed, the error handle may call device_unregister()
	which in return will call gadget_release.In current code,
	gadget_release have two issues:
		1: the_controller is a NULL pointer.
		2: if we free udc here, then the following code in probe
		   will access NULL pointer.

Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:41:56 +03:00
Kuninori Morimoto
f427eb64f4 usb: gadget: renesas_usbhs: support otg pin control
some renesas_usbhs device is supporting OTG external device interface.
In that device, it is necessary to control PWEN/EXTLP on DVSTCTR.
This patch support it.
But renesas_usbhs driver doesn't have OTG support for now.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:41:47 +03:00
Kuninori Morimoto
258485d990 usb: gadget: renesas_usbhs: add bus control functions
this patch add DVSTCTR control function for HOST support

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:41:38 +03:00
Kuninori Morimoto
11935de557 usb: gadget: renesas_usbhs: change usbhsc_bus_ctrl() to usbsc_set_buswait()
renesas_usbhs will have register DVSTCTR control function for HOST support.
This patch changes usbhsc_bus_ctrl() to usbsc_set_buswait(),
to remove DVSTCTR access from it,

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:41:37 +03:00
Felipe Balbi
089b837a39 usb: gadget: fix typo for default U1/U2 exit latencies
s/DEFULT/DEFAULT/, no functional changes.

Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:39:59 +03:00
Yoshihiro Shimoda
b8a56e17e1 usb: gadget: r8a66597-udc: add support for SUDMAC
SH7757 has a USB function with internal DMA controller (SUDMAC).
This patch supports the SUDMAC. The SUDMAC is incompatible with
general-purpose DMAC. So, it doesn't use dmaengine.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2011-10-13 20:38:39 +03:00
Linus Walleij
2744e8afb3 drivers: create a pin control subsystem
This creates a subsystem for handling of pin control devices.
These are devices that control different aspects of package
pins.

Currently it handles pinmuxing, i.e. assigning electronic
functions to groups of pins on primarily PGA and BGA type of
chip packages which are common in embedded systems.

The plan is to also handle other I/O pin control aspects
such as biasing, driving, input properties such as
schmitt-triggering, load capacitance etc within this
subsystem, to remove a lot of ARM arch code as well as
feature-creepy GPIO drivers which are implementing the same
thing over and over again.

This is being done to depopulate the arch/arm/* directory
of such custom drivers and try to abstract the infrastructure
they all need. See the Documentation/pinctrl.txt file that is
part of this patch for more details.

ChangeLog v1->v2:

- Various minor fixes from Joe's and Stephens review comments
- Added a pinmux_config() that can invoke custom configuration
  with arbitrary data passed in or out to/from the pinmux driver

ChangeLog v2->v3:

- Renamed subsystem folder to "pinctrl" since we will likely
  want to keep other pin control such as biasing in this
  subsystem too, so let us keep to something generic even though
  we're mainly doing pinmux now.
- As a consequence, register pins as an abstract entity separate
  from the pinmux. The muxing functions will claim pins out of the
  pin pool and make sure they do not collide. Pins can now be
  named by the pinctrl core.
- Converted the pin lookup from a static array into a radix tree,
  I agreed with Grant Likely to try to avoid any static allocation
  (which is crap for device tree stuff) so I just rewrote this
  to be dynamic, just like irq number descriptors. The
  platform-wide definition of number of pins goes away - this is
  now just the sum total of the pins registered to the subsystem.
- Make sure mappings with only a function name and no device
  works properly.

ChangeLog v3->v4:

- Define a number space per controller instead of globally,
  Stephen and Grant requested the same thing so now maps need to
  define target controller, and the radix tree of pin descriptors
  is a property on each pin controller device.
- Add a compulsory pinctrl device entry to the pinctrl mapping
  table. This must match the pinctrl device, like "pinctrl.0"
- Split the file core.c in two: core.c and pinmux.c where the
  latter carry all pinmux stuff, the core is for generic pin
  control, and use local headers to access functionality between
  files. It is now possible to implement a "blank" pin controller
  without pinmux capabilities. This split will make new additions
  like pindrive.c, pinbias.c etc possible for combined drivers
  and chunks of functionality which is a GoodThing(TM).
- Rewrite the interaction with the GPIO subsystem - the pin
  controller descriptor now handles this by defining an offset
  into the GPIO numberspace for its handled pin range. This is
  used to look up the apropriate pin controller for a GPIO pin.
  Then that specific GPIO range is matched 1-1 for the target
  controller instance.
- Fixed a number of review comments from Joe Perches.
- Broke out a header file pinctrl.h for the core pin handling
  stuff that will be reused by other stuff than pinmux.
- Fixed some erroneous EXPORT() stuff.
- Remove mispatched U300 Kconfig and Makefile entries
- Fixed a number of review comments from Stephen Warren, not all
  of them - still WIP. But I think the new mapping that will
  specify which function goes to which pin mux controller address
  50% of your concerns (else beat me up).

ChangeLog v4->v5:

- Defined a "position" for each function, so the pin controller now
  tracks a function in a certain position, and the pinmux maps define
  what position you want the function in. (Feedback from Stephen
  Warren and Sascha Hauer).
- Since we now need to request a combined function+position from
  the machine mapping table that connect mux settings to drivers,
  it was extended with a position field and a name field. The
  name field is now used if you e.g. need to switch between two
  mux map settings at runtime.
- Switched from a class device to using struct bus_type for this
  subsystem. Verified sysfs functionality: seems to work fine.
  (Feedback from Arnd Bergmann and Greg Kroah-Hartman)
- Define a per pincontroller list of GPIO ranges from the GPIO
  pin space that can be handled by the pin controller. These can
  be added one by one at runtime. (Feedback from Barry Song)
- Expanded documentation of regulator_[get|enable|disable|put]
  semantics.
- Fixed a number of review comments from Barry Song. (Thanks!)

ChangeLog v5->v6:

- Create an abstract pin group concept that can sort pins into
  named and enumerated groups no matter what the use of these
  groups may be, one possible usecase is a group of pins being
  muxed in or so. The intention is however to also use these
  groups for other pin control activities.
- Make it compulsory for pinmux functions to associate with
  at least one group, so the abstract pin group concept is used
  to define the groups of pins affected by a pinmux function.
  The pinmux driver interface has been altered so as to enforce
  a function to list applicable groups per function.
- Provide an optional .group entry in the pinmux machine map
  so the map can select beteween different available groups
  to be used with a certain function.
- Consequent changes all over the place so that e.g. debugfs
  present reasonable information about the world.
- Drop the per-pin mux (*config) function in the pinmux_ops
  struct - I was afraid that this would start to be used for
  things totally unrelated to muxing, we can introduce that to
  the generic struct pinctrl_ops if needed. I want to keep
  muxing orthogonal to other pin control subjects and not mix
  these things up.

ChangeLog v6->v7:

- Make it possible to have several map entries matching the
  same device, pin controller and function, but using
  a different group, and alter the semantics so that
  pinmux_get() will pick all matching map entries, and
  store the associated groups in a list. The list will
  then be iterated over at pinmux_enable()/pinmux_disable()
  and corresponding driver functions called for each
  defined group. Notice that you're only allowed to map
  multiple *groups* to the same
  { device, pin controller, function } triplet, attempts
  to map the same device to multiple pin controllers will
  for example fail. This is hopefully the crucial feature
  requested by Stephen Warren.
- Add a pinmux hogging field to the pinmux mapping entries,
  and enable the pinmux core to hog pinmux map entries.
  This currently only works for pinmuxes without assigned
  devices as it looks now, but with device trees we can
  look up the corresponding struct device * entries when
  we register the pinmux driver, and have it hog each
  pinmux map in turn, for a simple approach to
  non-dynamic pin muxing. This addresses an issue from
  Grant Likely that the machine should take care of as
  much of the pinmux setup as possible, not the devices.
  By supplying a list of hogs, it can now instruct the
  core to take care of any static mappings.
- Switch pinmux group retrieveal function to grab an
  array of strings representing the groups rather than an
  array of unsigned and rewrite accordingly.
- Alter debugfs to show the grouplist handled by each
  pinmux. Also add a list of hogs.
- Dynamically allocate a struct pinmux at pinmux_get() and
  free it at pinmux_put(), then add these to the global
  list of pinmuxes active as we go along.
- Go over the list of pinmux maps at pinmux_get() time
  and repeatedly apply matches.
- Retrieve applicable groups per function from the driver
  as a string array rather than a unsigned array, then
  lookup the enumerators.
- Make the device to pinmux map a singleton - only allow the
  mapping table to be registered once and even tag the
  registration function with __init so it surely won't be
  abused.
- Create a separate debugfs file to view the pinmux map at
  runtime.
- Introduce a spin lock to the pin descriptor struct, lock it
  when modifying pin status entries. Reported by Stijn Devriendt.
- Fix up the documentation after review from Stephen Warren.
- Let the GPIO ranges give names as const char * instead of some
  fixed-length string.
- add a function to unregister GPIO ranges to mirror the
  registration function.
- Privatized the struct pinctrl_device and removed it from the
  <linux/pinctrl/pinctrl.h> API, the drivers do not need to know
  the members of this struct. It is now in the local header
  "core.h".
- Rename the concept of "anonymous" mux maps to "system" muxes
  and add convenience macros and documentation.

ChangeLog v7->v8:

- Delete the leftover pinmux_config() function from the
 <linux/pinctrl/pinmux.h> header.
- Fix a race condition found by Stijn Devriendt in pin_request()

ChangeLog v8->v9:

- Drop the bus_type and the sysfs attributes and all, we're not on
  the clear about how this should be used for e.g. userspace
  interfaces so let us save this for the future.
- Use the right name in MAINTAINERS, PIN CONTROL rather than
  PINMUX
- Don't kfree() the device state holder, let the .remove() callback
  handle this.
- Fix up numerous kerneldoc headers to have one line for the function
  description and more verbose documentation below the parameters

ChangeLog v9->v10:
- pinctrl: EXPORT_SYMBOL needs export.h, folded in a patch
  from Steven Rothwell
- fix pinctrl_register error handling, folded in a patch from
  Axel Lin
- Various fixes to documentation text so that it's consistent.
- Removed pointless comment from drivers/Kconfig
- Removed dependency on SYSFS since we removed the bus in
  v9.
- Renamed hopelessly abbreviated pctldev_* functions to the
  more verbose pinctrl_dev_*
- Drop mutex properly when looking up GPIO ranges
- Return NULL instead of ERR_PTR() errors on registration of
  pin controllers, using cast pointers is fragile. We can
  live without the detailed error codes for sure.

Cc: Stijn Devriendt <highguy@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2011-10-13 12:49:17 +02:00
Murali Raja
3ceca74966 net-netlink: Add a new attribute to expose TOS values via netlink
This patch exposes the tos value for the TCP sockets when the TOS flag
is requested in the ext_flags for the inet_diag request. This would mainly be
used to expose TOS values for both for TCP and UDP sockets. Currently it is
supported for TCP. When netlink support for UDP would be added the support
to expose the TOS values would alse be done. For IPV4 tos value is exposed
and for IPV6 tclass value is exposed.

Signed-off-by: Murali Raja <muralira@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-12 19:09:18 -04:00
Hans Schillstrom
ae1d48b23d IPVS netns shutdown/startup dead-lock
ip_vs_mutext is used by both netns shutdown code and startup
and both implicit uses sk_lock-AF_INET mutex.

cleanup CPU-1         startup CPU-2
ip_vs_dst_event()     ip_vs_genl_set_cmd()
 sk_lock-AF_INET     __ip_vs_mutex
                     sk_lock-AF_INET
__ip_vs_mutex
* DEAD LOCK *

A new mutex placed in ip_vs netns struct called sync_mutex is added.

Comments from Julian and Simon added.
This patch has been running for more than 3 month now and it seems to work.

Ver. 3
    IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex
    instead of __ip_vs_mutex as sugested by Julian.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-10-12 18:32:15 +02:00
Ingo Molnar
910e94dd0c Merge branch 'tip/perf/core' of git://github.com/rostedt/linux into perf/core 2011-10-12 17:14:47 +02:00
Johannes Berg
73b9f03a81 mac80211: parse radiotap header earlier
We can now move the radiotap header parsing into
ieee80211_monitor_start_xmit(). This moves it out of
the hotpath, and also helps the code since now the
radiotap header will no longer be present in
ieee80211_xmit() etc. which is easier to understand.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-10-11 16:41:19 -04:00
Johannes Berg
a26eb27ab4 mac80211: move fragment flag to info flag as dont-fragment
The purpose of this is two-fold:
 1) by moving it out of tx_data.flags, we can in
    another patch move the radiotap parsing so it
    no longer is in the hotpath
 2) if a device implements fragmentation but can
    optionally skip it, the radiotap request for
    not doing fragmentation may be honoured

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-10-11 16:41:19 -04:00