The same loop to seek for a key were used on different places. Also,
no spinlock were protecting it to avoid the risk of replacing a keycode
while seeking for a new code.
This cleanup does:
- create an unique function to seek for a code;
- adds an spinlock to protect the table lookup;
- remove some unused code;
- simplifies to code to make it easier to understand.
Basically no change in behavior should be noticed after this patch.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Currently, the IR table is initialized by calling ir_input_init(). However,
this function doesn't return any error code, nor has a function to be called
when de-initializing the IR's.
Change the return argment to integer and make sure that each driver will
handle the error code. Also adds a function to free any resources that may
be allocating there: ir_input_free().
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Now that V4L drivers can support more than 7 bits for scan code, let's
add a modified version for the Hauppauge Grey IR containing the full IR
scancode.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Now that the IR conversion to dynamic tables has finished, we can get
rid of some fields and definitions that aren't used anymore.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
V4L drivers use an static keycode vector with 128 entries, where the scancode
indexes the keycode. While this works, it limits the scancodes to have only
7 bits, not allowing for example full RC5 codes.
Instead of implementing the same code on every V4L driver, provide a common
infrastructure to handle the bigger tables, minimizing the changes inside
each driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
As newer IR common code will be added on other files, we need a global
debug var inside the module.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The I2C adapter ID is actually depends on Board and may vary, Davinci
uses id=1, but in case of AM3517 id=3.
Signed-off-by: Vaibhav Hiremath <hvaibhav@ti.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
For whatever reason, the device structure pointer to
videobuf_queue_vmalloc_init is typed "void *", even though it's passed
right through to videobuf_queue_core_init(), which expects a struct
device pointer. The other videobuf implementations use struct device *;
I think vmalloc should too.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The videobuf_queue_ops function vector is not declared constant, but
there's no need for the videobuf layer to ever change it. Make it const
so that videobuf users can make their operations const without warnings.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Upcoming I2C v4l2_subdev drivers need a way to control the subdevice
power state from the core. This use case is already partially covered by
the tuner s_standby operation, but no way to explicitly come back from
the standby state is available.
Rename the tuner s_standby operation to core s_power, and fix tuner
drivers accordingly. The tuner core will call s_power(0) instead of
s_standby(). No explicit call to s_power(1) is required for tuners as
they are supposed to wake up from standby automatically.
[mchehab@redhat.com: CodingStyle fix]
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch adds a new subdriver to gspca supporting cams with the stv0680
bridge (replacing the old in kernel v4l1 driver).
Many thanks to Hans Verkuil for providing me with one of the 2 cams used in
testing this new sub driver.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This patch adds support for TEF6862 Car Radio Enhanced Selectivity Tuner.
It's implemented as a subdev, supporting checking signal strength
and setting and getting frequency.
Signed-off-by: Richard Röjfors <richard.rojfors@mocean-labs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The patch brings infrared remote support for some cx88 based cards.
Such as TeVii S460,S420.
Signed-off-by: Igor M. Liplianin <liplianin@me.by>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The patch brings infrared remote support for some cx88 based cards.
Such as:
TeVii S460,S420; Omicom SS4; SatTrade ST4200;
TBS 8920,8910; Prof 7300,6200.
Signed-off-by: Igor M. Liplianin <liplianin@me.by>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This adds an soc-camera / v4l2-subdev driver for the RJ54N1CB0C CMOS camera
sensor from Sharp. The sensor is very picky about initialisation and
configuration sequences. The driver limits artificially maximum window size by
800x600, although the sensor supports 1600x1200. Sizes above 800x600 don't seem
to work correctly, besides, examples from the system integrator use sizes above
640x480 only for still photography. Unfortunately, I had to use "magic"
register-value pairs for undocumented and "reserved" registers. This version of
the driver also omits some functionality, like cropping, which hasn't been
sufficiently tested yet and will be added later.
create mode 100644 drivers/media/video/rj54n1cb0c.c
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add v4l2_subdev_ir_ops and IR notify defines for v4l2_device. This change
is specifically needed at this time to support the integrated IR controller in
the CX2388[58] chips.
Signed-off-by: Andy Walls <awalls@radix.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Add identifiers for CX2388[578] chips, CX2310[012] chips, integrated
A/V decoders cores, integrated IR controller core, and the CX23417
MPEG encoder. The cx23885 module and cx25840 module will use these
identifiers in upcoming changes.
Signed-off-by: Andy Walls <awalls@radix.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits)
tracing: Separate raw syscall from syscall tracer
ring-buffer-benchmark: Add parameters to set produce/consumer priorities
tracing, function tracer: Clean up strstrip() usage
ring-buffer benchmark: Run producer/consumer threads at nice +19
tracing: Remove the stale include/trace/power.h
tracing: Only print objcopy version warning once from recordmcount
tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used
ring-buffer: Move access to commit_page up into function used
tracing: do not disable interrupts for trace_clock_local
ring-buffer: Add multiple iterations between benchmark timestamps
kprobes: Sanitize struct kretprobe_instance allocations
tracing: Fix to use __always_unused attribute
compiler: Introduce __always_unused
tracing: Exit with error if a weak function is used in recordmcount.pl
tracing: Move conditional into update_funcs() in recordmcount.pl
tracing: Add regex for weak functions in recordmcount.pl
tracing: Move mcount section search to front of loop in recordmcount.pl
tracing: Fix objcopy revision check in recordmcount.pl
tracing: Check absolute path of input file in recordmcount.pl
tracing: Correct the check for number of arguments in recordmcount.pl
...
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: Fix trace_marker output
tracing: Fix event format export
tracing: Fix return value of tracing_stats_read()
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
rcu: Make RCU's CPU-stall detector be default
rcu: Add expedited grace-period support for preemptible RCU
rcu: Enable fourth level of TREE_RCU hierarchy
rcu: Rename "quiet" functions
rcu: Re-arrange code to reduce #ifdef pain
rcu: Eliminate unneeded function wrapping
rcu: Fix grace-period-stall bug on large systems with CPU hotplug
rcu: Eliminate __rcu_pending() false positives
rcu: Further cleanups of use of lastcomp
rcu: Simplify association of forced quiescent states with grace periods
rcu: Accelerate callback processing on CPUs not detecting GP end
rcu: Mark init-time-only rcu_bootup_announce() as __init
rcu: Simplify association of quiescent states with grace periods
rcu: Rename dynticks_completed to completed_fqs
rcu: Enable synchronize_sched_expedited() fastpath
rcu: Remove inline from forward-referenced functions
rcu: Fix note_new_gpnum() uses of ->gpnum
rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter
rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter
rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls
...
* 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ratelimit: Make suppressed output messages more useful
printk: Remove ratelimit.h from kernel.h
ratelimit: Fix/allow use in atomic contexts
ratelimit: Use per ratelimit context locking
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mutex: Fix missing conditions to build mutex_spin_on_owner()
mutex: Better control mutex adaptive spinning config
locking, task_struct: Reduce size on TRACE_IRQFLAGS and 64bit
locking: Use __[SPIN|RW]_LOCK_UNLOCKED in [spin|rw]_lock_init()
locking: Remove unused prototype
locking: Reduce ifdefs in kernel/spinlock.c
locking: Make inlining decision Kconfig based
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
x86/amd-iommu: Remove amd_iommu_pd_table
x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
x86/amd-iommu: Cleanup DTE flushing code
x86/amd-iommu: Introduce iommu_flush_device() function
x86/amd-iommu: Cleanup attach/detach_device code
x86/amd-iommu: Keep devices per domain in a list
x86/amd-iommu: Add device bind reference counting
x86/amd-iommu: Use dev->arch->iommu to store iommu related information
x86/amd-iommu: Remove support for domain sharing
x86/amd-iommu: Rearrange dma_ops related functions
x86/amd-iommu: Move some pte allocation functions in the right section
x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
x86/amd-iommu: Use get_device_id and check_device where appropriate
x86/amd-iommu: Move find_protection_domain to helper functions
x86/amd-iommu: Simplify get_device_resources()
x86/amd-iommu: Let domain_for_device handle aliases
x86/amd-iommu: Remove iommu specific handling from dma_ops path
x86/amd-iommu: Remove iommu parameter from __(un)map_single
x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
...
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (31 commits)
GFS2: Fix glock refcount issues
writeback: remove unused nonblocking and congestion checks (gfs2)
GFS2: drop rindex glock to refresh rindex list
GFS2: Tag all metadata with jid
GFS2: Locking order fix in gfs2_check_blk_state
GFS2: Remove dirent_first() function
GFS2: Display nobarrier option in /proc/mounts
GFS2: add barrier/nobarrier mount options
GFS2: remove division from new statfs code
GFS2: Improve statfs and quota usability
GFS2: Use dquot_send_warning()
VFS: Export dquot_send_warning
GFS2: Add set_xquota support
GFS2: Add get_xquota support
GFS2: Clean up gfs2_adjust_quota() and do_glock()
GFS2: Remove constant argument from qd_get()
GFS2: Remove constant argument from qdsb_get()
GFS2: Add proper error reporting to quota sync via sysfs
GFS2: Add get_xstate quota function
GFS2: Remove obsolete code in quota.c
...
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (30 commits)
TOMOYO: Add recursive directory matching operator support.
remove CONFIG_SECURITY_FILE_CAPABILITIES compile option
SELinux: print denials for buggy kernel with unknown perms
Silence the existing API for capability version compatibility check.
LSM: Move security_path_chmod()/security_path_chown() to after mutex_lock().
SELinux: header generation may hit infinite loop
selinux: Fix warnings
security: report the module name to security_module_request
Config option to set a default LSM
sysctl: require CAP_SYS_RAWIO to set mmap_min_addr
tpm: autoload tpm_tis based on system PnP IDs
tpm_tis: TPM_STS_DATA_EXPECT workaround
define convenient securebits masks for prctl users (v2)
tpm: fix header for modular build
tomoyo: improve hash bucket dispersion
tpm add default function definitions
LSM: imbed ima calls in the security hooks
SELinux: add .gitignore files for dynamic classes
security: remove root_plug
SELinux: fix locking issue introduced with c6d3aaa4e3
...
Starting with version 4.5, GCC has a new built-in function
__builtin_unreachable() that can be used in places like the kernel's
BUG() where inline assembly is used to transfer control flow. This
eliminated the need for an endless loop in these places.
The patch adds a new macro 'unreachable()' that will expand to either
__builtin_unreachable() or an endless loop depending on the compiler
version.
Change from v1: Simplify unreachable() for non-GCC 4.5 case.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit ae21ee65e8 "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.
Add a function to pci core to allow an IOMMU to request that ACS
be enabled. The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order; iommu has only been detected not initialized.
Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.
Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Remove 'port_type' field in struct pcie_port_data(), because we can
get port type information from struct pci_dev. With this change, this
patch also does followings:
- Remove struct pcie_port_data because it no longer has any field.
- Remove portdrv private definitions about port type (PCIE_RC_PORT,
PCIE_SW_UPSTREAM_PORT and PCIE_SW_DOWNSTREAM_PORT), and use generic
definitions instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch cleans up the service irqs initialization as follows:
- Remove 'irq_mode' field in pcie_port_data and related definitions,
which is not needed because we can get the same information from
'is_msix', 'is_msi' and 'pin' fields in struct pci_dev.
- Change the name of 'vectors' argument of assign_interrupt_mode() to
'irqs' because it holds irq numbers actually. People might confuse
it with CPU vector or MSI/MSI-X vector.
- Change function name assign_interrupt_mode() to init_service_irqs()
becasuse we no longer have 'irq_mode' data structure, and new name
is more straightforward (IMO).
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
"Definition" is misspelled "defintion" in several comments; this
patch fixes them. No code changes.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
While the target reset task management function has been deprecated in
newer specs, it is still in use by SCSI FC drivers and there is no
real replacement. Add the target reset flag to the FCP header file to
allow usage of this definition in SCSI FC drivers.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add a member function pointer as get_lesb to libfc_function_template so LLD
can fill the LESB based on its own statistics. For fcoe, it fills the LESB
as a fcoe_fc_els_lesb struct according to FC-BB-5.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add struct fcoe_fc_els_lesb as described in FC-BB-5 LESB for FCoE. It has
the same size as LESB defined in FC-FS-3 (struct fc_els_lesb) but members
have different meanings according to FC-BB-5.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When the D bit is set if the FKA_ADV_Period of the FIP Discovery
Advertisement, the ENode should not transmit period ENode FIP Keep Alive and
VN_Port FIP Keep Alive (FC-BB-5 Rev2, 7.8.3.13).
Note that fcf->flags is taken directly from the fip_header, I am claiming one
bit for the purpose of the FIP_FKA_Period D bit as FIP_FL_FK_ADV_B, and use
FIP_HEADER_FLAGS as bitmask for bits used in fip_header.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
FC-BB-5 Rev2.0, Clause 7.10 extends the FC-LS-3 LESB for FC-BB_E. We are
already tracking Link Failure Count so add the rest in this patch.
For VLinkFailureCount and MissDiscAdvCount, they are part of the per-cpu
fcoe_dev_stats. For SymbolErrorCount, ErroredBlockCount, and FCSErrorCount,
they are defined in IEEE 802.3-2008 and are per LLD. They are expected to
come from LLD.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
include/scsi/osd_protocol.h uses ALIGN() without an #include
<linux/kernel.h>, leading to:
| include/scsi/osd_protocol.h:362: error: implicit declaration of function 'ALIGN'
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Administer some love to the osd_req_decode_sense function
* Fix a bad bug with osd_req_decode_sense(). If there was no scsi
residual, .i.e the request never reached the target, then all the
osd_sense_info members where garbage.
* Add grossly missing in/out_resid to osd_sense_info and fill them in
properly.
* Define an osd_err_priority enum which divides the possible errors into
7 categories in ascending severity. Each category is also assigned a
Linux return code translation.
Analyze the different osd/scsi/block returned errors and set the
proper osd_err_priority and Linux return code accordingly.
* extra check a few situations so not to get stuck with inconsistent
error view. Example an empty residual with an error code, and other
places ...
Lots of libosd's osd_req_decode_sense clients had this logic in some
form or another. Consolidate all these into one place that should
actually know about osd returns. Thous translating it to a more
abstract error.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Define an osd_dev_info structure that Uniquely identifies an OSD
device lun on the network. The identification is built from unique
target attributes and is the same for all network/SAN machines.
osduld_info_lookup() - NEW
New API that will lookup an osd_dev by its osd_dev_info.
This is used by pNFS-objects for cross network global device
identification. And by exofs multy-device support, the device
info is specified in the on-disk exofs device table.
osduld_device_info() - NEW
Given an osd_dev handle returns its associated osd_dev_info.
The ULD fetches this information at startup and hangs it on
each OSD device. (This is a fast operation that can be called
at any condition)
osduld_device_same() - NEW
With a given osd_dev at one hand and an osd_dev_info
at another, we would like to know if they are the same
device.
Two osd_dev handles can be checked by:
osduld_device_same(od1, osduld_device_info(od2));
osd_auto_detect_ver() - REVISED
Now returns an osd_dev_info structure. Is only called once
by ULD as before. See added comments for how to use.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The true logic of this patch will be clear in the next patch where we
use the class_find_device() API. When doing so the use of an internal
kref leaves us a narrow window where a find is started while the actual
object can go away. Using the device's kobj reference solves this
problem because now the same kref is used for both operations. (Remove
and find)
Core changes
* Embed a struct device in uld_ structure and use device_register
instead of devie_create. Set __remove to be the device release
function.
* __uld_get/put is just get_/put_device. Now every thing is accounted
for on the device object. Internal kref is removed.
* At __remove() we can safely de-allocate the uld_ structure. (The
function has moved to avoid forward declaration)
Some cleanups
* Use class register/unregister is cleaner for this driver now.
* cdev ref-counting games are no longer necessary
I have incremented the device version string in case of new bugs.
Note: Previous bugfix of taking the reference around fput() still
applies.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add one more important cdb_field_offset that can be returned with
scsi_invalid_field_in_cdb. It is the offset of the permissions_bit_mask
field in the capabilities structure.
Interestingly, the offset is the same for V1/V2
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
define a new osd_dev_is_ver1 that operates on devices
and the old osd_req_is_ver1 uses that new API.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This implements warm target reset tmf support for
the scsi-ml target reset callback. Previously we would
just drop the session in that callback. This patch will
now try a target reset and if that fails drop the session.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Patch and mail from both MikeC and HannesR:
Before we're trying to send a PDU we have to check whether a TMF
is active. If so and if the PDU will be affected by the TMF
we should allow only Data-out PDUs to be sent.
If fast_abort is set, no Data-out PDUs will be sent while
a LUN reset is being processed for a affected LUN.
fast_abort is now ingored during a ABORT TASK tmf. We will not
send any Data-outs for a task if the task is being aborted.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Some of our virtual SCSI hosts don't have a proper bus parent at the
top, which can be a problem for doing DMA on them
This patch makes the host device cache a pointer to the physical bus
device and provides an extra API for setting it (the normal API picks
it up from the parent). This patch also modifies the qla2xxx and lpfc
vport logic to use the new DMA host setting API.
Acked-By: James Smart <james.smart@emulex.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Customers and certification tests have pointed out that we don't
show up on the switch management software as an initiator.
On some MDS switches 'show fcns database' command shows libfc
initiators as 'fcp' not 'fcp:init' like other initiators.
On others switches, I think the switch gets the features by doing a PRLI,
but it may be only certain models or under certain configurations.
Fix this by registering our FC4 features with the RFF_ID CT request
after local port login and after the RFT_ID.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There was a locking problem where the fip->lock was held during
the call to update_mac(). The rtnl_lock() must be taken before
the fip->lock, not the other way around. This fixes that.
Now that fcoe_ctlr_recv_flog() is called only from the response handler
to a FLOGI request, some checking can be eliminated. Instead of calling
update_mac(), just fill in the granted_mac address for the passed-in
frame (skb).
Eliminate the passed-in source MAC address since it is also in the skb.
Also, in fcoe, call fcoe_set_src_mac() directly instead of going thru
the fip function pointer. This will generate less code.
Then, since fip isn't needed for LOGO response, use lport as the arg.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This is to notify the LLD when an FC_ID is assigned to the local port.
The fnic driver needs to push the assigned FC_ID to firmware.
It currently does this by intercepting the FLOGI responses, and
in order to make that code more common with FIP and NPIV, it
makes more sense to wait until the local port has completely
handled the FLOGI or FDISC response. Also, when we fix
point-to-point FC_ID assignment, we'll need this callback as well.
Add a call to the libfc template, which is called whenever
the local port FC_ID is being assigned. It defaults to
fc_lport_set_fid(), supplied by libfc.
As additional benefit of this function, the LLD may determine
the MAC address that caused the change by looking at the received frame.
We also print the assigned port ID as long as it isn't 0.
Setting port ID to 0 happens often in reset while retrying FLOGI,
and would be uninteresting. This replaces the previous message
which didn't identify the host adapter instance.
patch v2 note: changed one word in a comment. "intercepted" -> "provided".
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The strncpy for RSPN_ID and RSNN_NN requests was padding
past the allocated frame size.
Get the string length before filling in the ct header.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The code that filled in the name server RNN_ID (register node name)
request had somehow gotten a line in it from the RFT_ID code
which copies 32 bytes of data over the relatively short payload.
This caused some corruption and hangs.
Simply deleted the extraneous line.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The fnic driver with FIP is reporting link up, even though it's down.
When the interface is shut down by the switch, we receive a clear
virtual link, and set the state reported to libfc as down, although
we still report it up. Clearly wrong. That causes the subsequent
link down event not to be reported, and /sys shows the host "Online".
Currently, in FIP mode, if an FCF times out, then link to libfc
is reported as down, to stop FLOGIs. That interferes with the LLD
link down being reported.
Users really need to know the physical link information, to diagnose
cabling issues, so physical link status should be reported to libfc.
If the selected FCF needs to be reported, that should be done
separately, in a later patch.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Allow FIP to be disabled by the driver for devices
that want to use libfcoe in non-FIP mode.
The driver merely sets the fcoe_ctlr mode to the state which
should be entered when the link comes up. The default is auto.
No change is needed for fcoe.c which uses auto mode.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cleans up frame allocation APIs to have just single fc_frame_alloc API.
Removes _fc_frame_alloc, renames __fc_frame_alloc to _fc_frame_alloc.
Modifies fc_fcp_send_data for removed _fc_frame_alloc, fc_fcp_send_data
was the only user of removed _fc_frame_alloc.
Also Adds check in fc_frame_alloc to do mod by 4 for only non-zero
len value.
This patch is prep work to fix can_queue reducing in next patch.
Single fc_frame_alloc API helps in fixing can_queue reducing in
next patch.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Ensures that there are kernel-doc style comments for all
routines and structures.
There were also a few instances of fc_lport's named 'lp'
which were switched to 'lport' as per the libfc/libfcoe/fcoe
naming convention.
Also, emacs 'indent-region' and 'tabify' were ran on libfcoe.c.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch makes a variety of cleanup changes to all libfc files.
This patch adds kernel-doc headers to all functions lacking them
and attempts to better format existing headers. It also add kernel-doc
headers to structures.
This patch ensures that the current naming conventions for local ports,
remote ports and remote port private data is upheld in the following
manner.
struct instance (i.e. variable name)
--------------------------------------------------
fc_lport lport
fc_rport rport
fc_rport_libfc_priv rpriv
fc_rport_priv rdata
I also renamed dns_rp and ptp_rp to dns_rdata and ptp_rdata
respectively.
I used emacs 'indent-region' and 'tabify' on all libfc files
to correct spacing alignments.
I feel sorry for anyone attempting to review this patch.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This is the Open-FCoE implementation of the FC
passthrough support via bsg interface.
Passthrough support is added to both N_Ports and
VN_Ports.
Signed-off-by: Steve Ma <steve.ma@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Export fc_els.h, fc_fs.h, fc_gs.h and fc_ns.h so that they
may be used by applications.
This will be needed for FC Passthrough applications like fcping,
but could be used by other applications.
Fix to include <linux/types.h> to exported files provided by
Chris Leech <christopher.leech@intel.com>.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Register the fc_host symbolic name as the symbolic port name
with the fabric name server.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Register the fc_host symbolic name as the symbolic node name
with the fabric name server.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
One could interpret FC-GS-5 to say that an explicit RNN_ID is required
before RSNN_NN is allowed to succeed, which is why RNN_ID was not obsoleted
along with RPN_ID acording to this document:
ftp://ftp.t11.org/t11/member/fc/gs-5/05-546v2.pdf
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
RPN_ID has been obsolete per FC-GS-5 for several years. The port name is
registered implicitly as part of FLOGI, and it is undesirable for ports to
change a registered port name using RPN_ID while logged into the fabric.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The FIP code in libfcoe needed several changes to support NPIV
1) dst_src_addr needs to be managed per-n_port-ID for FPMA fabrics with NPIV
enabled. Managing the MAC address is now handled in fcoe, with some slight
changes to update_mac() and a new get_src_addr() function pointer.
2) The libfc elsct_send() hook is used to setup FCoE specific response
handlers for FIP encapsulated ELS exchanges. This lets the FCoE specific
handling know which VN_Port the exchange is for, and doesn't require
tracking OX_IDs. It might be possible to roll back to the full FIP frame
in these, but for now I've just stashed the contents of the MAC address
descriptor in the skb context block for later use. Also, because
fcoe_elsct_send() just passes control on to fc_elsct_send(), all transmits
still come through the normal frame_send() path.
3) The NPIV changes added a mutex hold in the keep alive sending, the lport
mutex is protecting the vport list. We can't take a mutex from a timer,
so move the FIP keep alive logic to the link work struct.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add FDISC ELS handling to libfc and libfcoe, treat it the same as FLOGI where
appropriate.
Add checking for NPIV support in the FLOGI LS_ACC service parameters.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
NPIV vports are managed in libfc by changing their virtual link state
when the parent N_Ports internal state changes. The vport link is only
online when the N_Port is in a ready state (logged into the fabric).
vport_state is updated as needed in this patch as well, currently the states
LINKDOWN, INITIALIZING, ACTIVE, DSIABLED, and NO_FABRIC_SUPP are used.
This also changes the fc_host port_state handling to differentiate between
LINKDOWN and OFFLINE.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Adds a function to create a new VN_Port instances, which share the EM
list with the N_Port, VN_Port lookup by fabric ID when responding to a new
request (otherwise the exchange lookup from the N_Ports EM list is trusted to
return an exchange with a cached lport value for the correct VN_Port),
a pointer to a fc_vport structure for VN_Ports, and flags to indicate if an
N_Port supports NPIV and if the switch/fabric allows it.
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
I'd like to keep basic initialization together with allocation, which means
this can't just be a tail-call to scsi_host_alloc.
This is needed to create a generic libfc host allocation routine for NPIV
VN_Ports, which will share the exchange ID space (through sharing exchange
manager structures) with the parent lport. In order to clone the exchange
manager list when the lport is allocated, the list head must be initialized
earlier.
Also, update fnic to use the libfc_host_alloc so that later changes do not break
it. (contribution by Joe Eykholt)
Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
include/scsi/libfc.h is currently loaded with common code
shared between libfc's sub-modules as well as shared between
libfc and fcoe. Previous patches attempted to move out
non-common code. This patch creates two files for common
libfc routines that will not be shared with fcoe, fnic or
any other LLDs.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This function is never used, let's remove it.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch moves all non-common routines and function prototypes
out of libfc.h and into the appropriate .c files. It makes these
routines 'static' when necessary and removes any unnecessary EXPORT_SYMBOL
statements.
A result of moving the fc_exch_seq_send, fc_seq_els_rsp_send, fc_exch_alloc
and fc_seq_start_next prototypes out of libfc.h is that they were no longer
being imported into fc_exch.c when libfc.h was included. This caused errors
where routines in fc_exch.c were looking for undefined symbols. To fix this
this patch reorganizes fc_seq_alloc, fc_seq_start_next and
fc_seq_start_next_locked. This move also made it so that
fc_seq_start_next_locked did not need to be prototyped at the top of
fc_exch.c.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Move the duplicated code from FC LLDs to SCSI FC transport class.
Acked-by: James Smart <james.smart@emulex.com>
Acked-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Acked-by: Abhijeet Joglekar <abjoglek@cisco.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Make scsi_dh_activate() function asynchronous, by taking in two additional
parameters, one is the callback function and the other is the data to call
the callback function with.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Current FC HBA queue_depth ramp up code depends on last queue
full time. The sdev already has last_queue_full_time field to
track last queue full time but stored value is truncated by
last four bits.
So this patch updates last_queue_full_time without truncating
last 4 bits to store full value and then updates its only
current usages in scsi_track_queue_full to ignore last four bits
to keep current usages same while also use this field
in added ramp up code.
Adds scsi_handle_queue_ramp_up to ramp up queue_depth on
successful completion of IO. The scsi_handle_queue_ramp_up will
do ramp up on all luns of a target, just same as ramp down done
on all luns on a target.
The ramp up is skipped in case the change_queue_depth is not
supported by LLD or already reached to added max_queue_depth.
Updates added max_queue_depth on every new update to default
queue_depth value.
The ramp up is also skipped if lapsed time since either last
queue ramp up or down is less than LLD specified
queue_ramp_up_period.
Adds queue_ramp_up_period to sysfs but only if change_queue_depth
is supported since ramp up and queue_ramp_up_period is needed only
in case change_queue_depth is supported first.
Initializes queue_ramp_up_period to 120HZ jiffies as initial
default value, it is same as used in existing lpfc and qla2xxx.
-v2
Combined all ramp code into this single patch.
-v3
Moves max_queue_depth initialization after slave_configure is
called from after slave_alloc calling done. Also adjusted
max_queue_depth check to skip ramp up if current queue_depth
is >= max_queue_depth.
-v4
Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f
to store or show its value.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Tested-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch modifies scsi_host_template->change_queue_depth so that
it takes an argument indicating why it is being called. This will be
used so that if a LLD needs to do some extra processing when
handling queue fulls or later ramp ups, it can do so.
This is a simple port of the drivers setting a change_queue_depth
callback. In the patch I just have these LLDs adjust the queue depth
if the user was requesting it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
[Vasu.Dev: v2
Also converted pmcraid_change_queue_depth and then verified
all modules compile using "make allmodconfig" for any new build
warnings on X86_64.
Updated original description after combing two original
patches from Mike to make this patch git bisectable.]
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
[jejb: fixed up 53c700]
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This driver supports PMC-Sierra PCIe SAS/SATA 8x6G SPC 8001 chip based
host adapters.
Signed-off-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Lindar Liu <lindar_liu@usish.com>
Signed-off-by: Tom Peng <tom_peng@usish.com>
Signed-off-by: Kevin Ao <aoqingyun@usish.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Timer crashes were caused by freeing a struct fc_rport_priv
with a timer pending, causing the timer facility list to be
corrupted. This was during FC uplink flap tests with a lot
of targets.
After discovery, we were doing an PLOGI on an rdata that was
in DELETE state but not yet removed from the lookup list.
This moved the rdata from DELETE state to PLOGI state.
If the PLOGI exchange allocation failed and needed to be
retried, the timer scheduling could race with the free
being done by fc_rport_work().
When fc_rport_login() is called on a rport in DELETE state,
move it to a new state RESTART. In fc_rport_work, when
handling a LOGO, STOPPED or FAILED event, look for restart
state. In the RESTART case, don't take the rdata off the
list and after the transport remote port is deleted and
exchanges are reset, re-login to the remote port.
Note that the new RESTART state also corrects a problem we
had when re-discovering a port that had moved to DELETE state.
In that case, a new rdata was created, but the old rdata
would do an exchange manager reset affecting the FC_ID
for both the new rdata and old rdata. With the new state,
the new port isn't logged into until after any old exchanges
are reset.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
In case of sequence offload, in fc_fcp_send_data(), the skb_fill_page_info()
called may end up adding more frags to the skb_shinfo(fp_skb(fp))->frags[],
exceeding SKB_MAX_FRAGS, this eventually corrupts the memory. I am adding the
FR_FRAME_SG_LEN back, but as SKB_MAX_FRAGS -1, leaving 1 for our fcoe_eof_crc
page. And send will be broken into multiple large sends if the frame already
contains more frags than skb handle.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Reported-by: Alex Lyakas <alexl@mellanox.co.il>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add definitions for UNMAP, WRITE SAME{16,32} and GET LBA STATUS
commands.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
With CLONE_IO, parent's io_context->nr_tasks is incremented, but never
decremented whenever copy_process() fails afterwards, which prevents
exit_io_context() from calling IO schedulers exit functions.
Give a task_struct to exit_io_context(), and call exit_io_context() instead of
put_io_context() in copy_process() cleanup path.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Cleanup the usage of DBE_VT_SIZE since the kernel already defines the
same macro for the same propose.
Also clean up a surrounding whitespaces.
Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Most users (except sh_mobile_lcdcfb.c) get their framebuffer from
vmalloc. Setting VM_IO is not necessary as the memory obtained
from vmalloc is System RAM type and is not susceptible to PCI memory
constraints.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jaya Kumar <jayakumar.lkml@gmail.com>
Its currently possible that several threads issuing a connect() find
the same timewait socket and try to reuse it, leading to list
corruptions.
Condition for bug is that these threads bound their socket on same
address/port of to-be-find timewait socket, and connected to same
target. (SO_REUSEADDR needed)
To fix this problem, we could unhash timewait socket while holding
ehash lock, to make sure lookups/changes will be serialized. Only
first thread finds the timewait socket, other ones find the
established socket and return an EADDRNOTAVAIL error.
This second version takes into account Evgeniy's review and makes sure
inet_twsk_put() is called outside of locked sections.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide common routine for the transition of operational state for a leaf
device during a root device transition.
Signed-off-by: Patrick Mullaney <pmullaney@novell.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using remote wakeup and delayed transmission to allow
online device to go into usb autosuspend.
Minimal alternate support for devices that don't support
remote wakeup.
Signed-off-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds a ioctl and property to allow userspace
to notify the kernel that a framebuffer has changed. Instead
of snooping the command stream this allows finer grained
tracking of which areas have changed.
The primary user for this functionality is virtual hardware
like the vmware svga device, but also Xen hardware likes to
be notify. There is also real hardware like DisplayLink and
DisplayPort that might take advantage of this ioctl.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
drm/ttm fails to build on MIPS because "struct page" is not known:
| In file included from drivers/gpu/drm/ttm/ttm_memory.c:28:
| include/drm/ttm/ttm_memory.h:154: warning: 'struct page' declared inside parameter list
| include/drm/ttm/ttm_memory.h:154: warning: its scope is only this definition or declaration, which is probably not what you want
| include/drm/ttm/ttm_memory.h:156: warning: 'struct page' declared inside parameter list
| drivers/gpu/drm/ttm/ttm_memory.c:540: error: conflicting types for 'ttm_mem_global_alloc_page'
| include/drm/ttm/ttm_memory.h:154: error: previous declaration of 'ttm_mem_global_alloc_page' was here
| drivers/gpu/drm/ttm/ttm_memory.c:561: error: conflicting types for 'ttm_mem_global_free_page'
| include/drm/ttm/ttm_memory.h:156: error: previous declaration of 'ttm_mem_global_free_page' was here
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Cc: stable@kernel.org
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
ata_set_lba_range_entries used the variable max for two different things
which was confusing. Make the function take a buffer size in bytes as
argument and return the used buffer size upon completion.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Our current TRIM payload is a single sector that can accommodate 64 *
65535 blocks being unmapped. Report this value in the Block Limits
Maximum Unmap LBA count field.
If a storage device supports TRIM and the DRAT and RZAT bits are set,
report TPRZ=1 in Read Capacity(16).
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This let's use use the linux drm headers as the canonical source for
libdrm on all platforms.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The vmwgfx driver has a per master rw lock around TTM, to guarantee
mutual exclusion when needed.
This is typically when all evictable buffers are evicted due to
1) vt switch
2) master switch
3) suspend / resume.
In the multi-master case, on master switch the new master takes the
previously active master lock in write mode, and then evicts all
buffers. Any clients to previous masters will then block on that lock
when trying to validate a buffer. fbdev also acts as a virtual master
wrt this.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch cleans up the VPD code by creating preprocessor definitions
and using them in the place of hardcoded constants.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function walks the whole hashtable so there is no point in
passing it a network namespace. Instead I purge all timewait
sockets from dead network namespaces that I find. If the namespace
is one of the once I am trying to purge I am guaranteed no new timewait
sockets can be formed so this will get them all. If the namespace
is one I am not acting for it might form a few more but I will
call inet_twsk_purge again and shortly to get rid of them. In
any even if the network namespace is dead timewait sockets are
useless.
Move the calls of inet_twsk_purge into batch_exit routines so
that if I am killing a bunch of namespaces at once I will just
call inet_twsk_purge once and save a lot of redundant unnecessary
work.
My simple 4k network namespace exit test the cleanup time dropped from
roughly 8.2s to 1.6s. While the time spent running inet_twsk_purge fell
to about 2ms. 1ms for ipv4 and 1ms for ipv6.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the code so fib_rules_register always takes a template instead
of the actual fib_rules_ops structure that will be used. This is
required for network namespace support so 2 out of the 3 callers already
do this, it allows the error handling to be made common, and it allows
fib_rules_unregister to free the template for hte caller.
Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu
to allw multiple namespaces to be cleaned up in the same rcu grace
period.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm.nlsk is provided by the xfrm_user module and is access via rcu from
other parts of the xfrm code. Add xfrm.nlsk_stash a copy of xfrm.nlsk that
will never be set to NULL. This allows the synchronize_net and
netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets
have been set to NULL.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add exit_list to struct net to support building lists of network
namespaces to cleanup.
- Add exit_batch to pernet_operations to allow running operations only
once during a network namespace exit. Instead of once per network
namespace.
- Factor opt ops_exit_list and ops_exit_free so the logic with cleanup
up a network namespace does not need to be duplicated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 8ec1e0ebe26087bfc5c0394ada5feb5758014fc8
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:16:35 2009 +0100
ipv4: add sysctl to accept packets with local source addresses
Change fib_validate_source() to accept packets with a local source address when
the "accept_local" sysctl is set for the incoming inet device. Combined with the
previous patches, this allows to communicate between multiple local interfaces
over the wire.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 68144d350f4f6c348659c825cde6a82b34c27a91
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:25 2009 +0100
net: fib_rules: add oif classification
Support routing table lookup based on the flow's oif. This is useful to
classify packets originating from sockets bound to interfaces differently.
The route cache already includes the oif and needs no changes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 229e77eec406ad68662f18e49fda8b5d366768c5
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:23 2009 +0100
net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME
The next patch will add oif classification, rename interface related members
and attributes to reflect that they're used for iif classification.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit b8952893d5d86f69c4e499d191b98c6658f64b0f
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:22 2009 +0100
net: fib_rules: rearrange struct fib_rule
The ifname member is only used to resolve interface names and is not needed
during rule lookups. The target and ctarget members however are used during
rule lookups and are currently located in a second cacheline.
Move ifname further to the end to make sure both target and ctarget are
located in the same cacheline as other members used during rule lookups.
The layout on 64 bit changes from:
struct fib_rule {
...
u32 table; /* 56 4 */
u8 action; /* 60 1 */
/* XXX 3 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 target; /* 64 4 */
/* XXX 4 bytes hole, try to pack */
struct fib_rule * ctarget; /* 72 8 */
struct rcu_head rcu; /* 80 16 */
struct net * fr_net; /* 96 8 */
};
to:
struct fib_rule {
...
u32 table; /* 40 4 */
u8 action; /* 44 1 */
/* XXX 3 bytes hole, try to pack */
u32 target; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
struct fib_rule * ctarget; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
char ifname[16]; /* 64 16 */
struct rcu_head rcu; /* 80 16 */
struct net * fr_net; /* 96 8 */
};
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were never able to get docs for this out of Toshiba for years. Dave
Barnes produced a NetBSD driver however and from that we can fill in the
needed tables.
As we correct the PCI identifiers a bit also update the old ide generic driver
at the same time so it stays compiling.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
RejActioned is used to prevent retransmission when a entity is on the
WAIT_F state, i.e., waiting for a frame with F-bit set due local busy
condition or a expired retransmission timer. (When these two events raise
they send a frame with the Poll bit set and enters in the WAIT_F state to
wait for a frame with the Final bit set.)
The local entity doesn't send I-frames(the data frames) until the receipt
of a frame with F-bit set. When that happens it also set RejActioned to false.
RejActioned is a mandatory feature of ERTM spec.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
As specified by ERTM spec an ERTM channel can acknowledge received
I-frames(the data frames) by sending an I-frame with the proper ReqSeq
value (i.e. ReqSeq is set to BufferSeq). Until now we aren't setting the
ReqSeq value on I-frame control bits. That way we can save sending
S-frames(Supervise frames) only to acknowledge receipt of I-frames. It
is very helpful to the full-duplex channel.
ReqSeq is the packet sequence number sent in an acknowledgement frame to
acknowledge receipt of frames up to (ReqSeq - 1).
BufferSeq controls the receiver buffer, it is used to delay
acknowledgement of new frames to not cause buffer overflow. BufferSeq
value is not increased until frames are pulled by reassembly function.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The tasklet schedule function helpers are just an obfuscation. So remove
them and call the schedule functions directly.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For future simplification it is important that the hci_recv_frame
function is no longer an inline function. So move it into the module
itself and export it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
o This is basic implementation of blkio controller cgroup interface. This is
the common interface visible to user space and should be used by different
IO control policies as we implement those.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It will lower the flush priority for NFS, and maybe more in future.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There are two spare field in the header common to all GFS2
metadata. One is just the right size to fit a journal id
in it, and this patch updates the journal code so that each
time a metadata block is modified, we tag it with the journal
id of the node which is performing the modification.
The reason for this is that it should make it much easier to
debug issues which arise if we can tell which node was the
last to modify a particular metadata block.
Since the field is updated before the block is written into
the journal, each journal should only contain metadata which
is tagged with its own journal id. The one exception to this
is the journal header block, which might have a different node's
id in it, if that journal was recovered by another node in the
cluster.
Thus each journal will contain a record of which nodes recovered
it, via the journal header.
The other field in the metadata header could potentially be
used to hold information about what kind of operation was
performed, but for the time being we just zero it on each
transaction so that if we use it for that in future, we'll
know that the information (where it exists) is reliable.
I did consider using the other field to hold the journal
sequence number, however since in GFS2's journaling we write
the modified data into the journal and not the original
data, this gives no information as to what action caused the
modification, so I think we can probably come up with a better
use for those 64 bits in the future.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Sending a message to userspace in a generic format to warn
of events (e.g. quota exceeded) in the quota subsystem is
a generically useful feature. This patch makes some minor
changes to the send_message function from dquot.c renaming
it quota_send_message, moving it to quota.c and exporting it
for use by filesystems which do not use the dquot code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is required for cluster filesystems which want to use
cached ACLs so that they can invalidate the cache when
required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
The discard ioctl is used by mkfs utilities to clear a block device
prior to putting metadata down. However, not all devices return zeroed
blocks after a discard. Some drives return stale data, potentially
containing old superblocks. It is therefore important to know whether
discarded blocks are properly zeroed.
Both ATA and SCSI drives have configuration bits that indicate whether
zeroes are returned after a discard operation. Implement a block level
interface that allows this information to be bubbled up the stack and
queried via a new block device ioctl.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Add support for the ATA TRIM command in libata. We translate a WRITE SAME 16
command with the unmap bit set into an ATA TRIM command and export enough
information in READ CAPACITY 16 and the block limits EVPD page so that the new
SCSI layer discard support will driver this for us.
Note that I hardcode the WRITE_SAME_16 opcode for now as the patch to introduce
the symbolic is not in 2.6.32 yet but only in the SCSI tree - as soon as it is
merged we can fix it up to properly use the symbolic name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
If ATA device failed FLUSH, it means that the device failed to write
out some amount of data and the error needs to be reported to upper
layers. As retries can't recover the lost data, FLUSH failures need to
be reported immediately in general.
However, if FLUSH fails due to transmission errors, the FLUSH needs to
be retried; otherwise, filesystems may switch to RO mode and/or raid
array may drop a drive for a random transmission glitch.
This condition can be rather easily reproduced on certain ahci
controllers which go through a PHY event after powersave mode switch +
ext4 combination. Powersave mode switch is often closely followed by
flush from the filesystem failing the FLUSH with ATA bus error which
makes the filesystem code believe that data is lost and drop to RO
mode. This was reported in the following bugzilla bug.
http://bugzilla.kernel.org/show_bug.cgi?id=14543
This patch makes libata EH retry FLUSH if it wasn't failed by the
device.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch moves s390 processor status word into the base kvm_run
struct and keeps it up-to date on all userspace exits.
The userspace ABI is broken by this, however there are no applications
in the wild using this. A capability check is provided so users can
verify the updated API exists.
Cc: stable@kernel.org
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.
[avi: future-proof abi by adding a flags field]
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults. If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.
Signed-off-by: Avi Kivity <avi@redhat.com>
Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available. Add extra data to internal error
reporting so we can expose it to the debugger. Extra data is specific
to the suberror.
Signed-off-by: Avi Kivity <avi@redhat.com>
Obviously, people tend to extend this header at the bottom - more or
less blindly. Ensure that deprecated stuff gets its own corner again by
moving things to the top. Also add some comments and reindent IOCTLs to
make them more readable and reduce the risk of number collisions.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.
Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.
This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.
[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.
A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future. Thus this patch
adds special support for the Xen HVM MSR.
I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files. When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.
I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
[jan: fix i386 build warning]
[avi: future proof abi with a flags field]
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
once the cpu detects pause-based looping.
Signed-off-by: "Zhai, Edwin" <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).
Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.
To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.
So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.
[avi: build fix on non-x86/ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Use gsi indexed array instead of scanning all entries on each interrupt
injection.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.
[avi: no PIC on ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Maybe 4.1.0 doesn't too, but this fixed it for me.
Caused by:
4a31276: x86: Turn the copy_from_user check into an (optional) compile time warning
9f0cf4a: x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <200910090724.n997OQl6013538@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eric Dumazet mentioned in a context of another problem:
"Well, it seems NFS reuses its socket, so maybe we miss some
cleaning as spotted in this old patch"
I've not check under which conditions that actually happens but
if true, we need to make sure we don't accidently leave stale
hints behind when the write queue had to be purged (whether reusing
with NFS can actually happen if purging took place is something I'm
not sure of).
...At least it compiles.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parse incoming TCP_COOKIE option(s).
Calculate <SYN,ACK> TCP_COOKIE option.
Send optional <SYN,ACK> data.
This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):
http://thread.gmane.org/gmane.linux.network/102586
Requires:
TCPCT part 1a: add request_values parameter for sending SYNACK
TCPCT part 1b: generate Responder Cookie secret
TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1d: define TCP cookie option, extend existing struct's
TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1f: Initiator Cookie => Responder
Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Data structures are carefully composed to require minimal additions.
For example, the struct tcp_options_received cookie_plus variable fits
between existing 16-bit and 8-bit variables, requiring no additional
space (taking alignment into consideration). There are no additions to
tcp_request_sock, and only 1 pointer in tcp_sock.
This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):
http://thread.gmane.org/gmane.linux.network/102586
The principle difference is using a TCP option to carry the cookie nonce,
instead of a user configured offset in the data. This is more flexible and
less subject to user configuration error. Such a cookie option has been
suggested for many years, and is also useful without SYN data, allowing
several related concepts to use the same extension option.
"Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html
"Re: what a new TCP header might look like", May 12, 1998.
ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail
These functions will also be used in subsequent patches that implement
additional features.
Requires:
TCPCT part 1a: add request_values parameter for sending SYNACK
TCPCT part 1b: generate Responder Cookie secret
TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Define sysctl (tcp_cookie_size) to turn on and off the cookie option
default globally, instead of a compiled configuration option.
Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
data values, retrieving variable cookie values, and other facilities.
Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
near its corresponding struct tcp_options_received (prior to changes).
This is a straightforward re-implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):
http://thread.gmane.org/gmane.linux.network/102586
These functions will also be used in subsequent patches that implement
additional features.
Requires:
net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED
Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Define (missing) hash message size for SHA1.
Define hashing size constants specific to TCP cookies.
Add new function: tcp_cookie_generator().
Maintain global secret values for tcp_cookie_generator().
This is a significantly revised implementation of earlier (15-year-old)
Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform.
Linux RCU technique appears to be well-suited to this application, though
neither of the circular queue items are freed.
These functions will also be used in subsequent patches that implement
additional features.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add optional function parameters associated with sending SYNACK.
These parameters are not needed after sending SYNACK, and are not
used for retransmission. Avoids extending struct tcp_request_sock,
and avoids allocating kernel memory.
Also affects DCCP as it uses common struct request_sock_ops,
but this parameter is currently reserved for future use.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon a patch by Philippe De Muyter, and feedback from Mark
Lord and Robert Hancock.
As noted by Mark Lord, the outdated ATA1 spec specifies a 20msec
timeout for setting DRQ but lots of common devices overshoot this.
Signed-off-by: David S. Miller <davem@davemloft.net>
The two functions skb_dma_map/unmap are unsafe to use as they cause
problems when packets are cloned and sent to multiple devices while a HW
IOMMU is enabled. Due to this it is best to remove the code so it is not
used by any other network driver maintainters.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a real fix for problem of utime/stime values decreasing
described in the thread:
http://lkml.org/lkml/2009/11/3/522
Now cputime is accounted in the following way:
- {u,s}time in task_struct are increased every time when the thread
is interrupted by a tick (timer interrupt).
- When a thread exits, its {u,s}time are added to signal->{u,s}time,
after adjusted by task_times().
- When all threads in a thread_group exits, accumulated {u,s}time
(and also c{u,s}time) in signal struct are added to c{u,s}time
in signal struct of the group's parent.
So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.
And accounted values are used by:
- task_times(), to get cputime of a thread:
This function returns adjusted values that originates from raw
{u,s}time and scaled by sum_exec_runtime that accounted by CFS.
- thread_group_cputime(), to get cputime of a thread group:
This function returns sum of all {u,s}time of living threads in
the group, plus {u,s}time in the signal struct that is sum of
adjusted cputimes of all exited threads belonged to the group.
The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:
group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)
This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).
To fix this, we could do:
group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)
But task_times() contains hard divisions, so applying it for
every thread should be avoided.
This patch fixes the above problem in the following way:
- Modify thread's exit (= __exit_signal()) not to use task_times().
It means {u,s}time in signal struct accumulates raw values instead
of adjusted values. As the result it makes thread_group_cputime()
to return pure sum of "raw" values.
- Introduce a new function thread_group_times(*task, *utime, *stime)
that converts "raw" values of thread_group_cputime() to "adjusted"
values, in same calculation procedure as task_times().
- Modify group's exit (= wait_task_zombie()) to use this introduced
thread_group_times(). It make c{u,s}time in signal struct to
have adjusted values like before this patch.
- Replace some thread_group_cputime() by thread_group_times().
This replacements are only applied where conveys the "adjusted"
cputime to users, and where already uses task_times() near by it.
(i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)
This patch have a positive side effect:
- Before this patch, if a group contains many short-life threads
(e.g. runs 0.9ms and not interrupted by ticks), the group's
cputime could be invisible since thread's cputime was accumulated
after adjusted: imagine adjustment function as adj(ticks, runtime),
{adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
After this patch it will not happen because the adjustment is
applied after accumulated.
v2:
- remove if()s, put new variables into signal_struct.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Remove if({u,s}t)s because no one call it with NULL now.
- Use cputime_{add,sub}().
- Add ifndef-endif for prev_{u,s}time since they are used
only when !VIRT_CPU_ACCOUNTING.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
LKML-Reference: <4B1624C7.7040302@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reorder task_struct field for TRACE_IRQFLAGS to remove padding
on 64-bit.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4B135F50.8070302@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
enter_syscall_print_##sname and exit_syscall_print_##sname don't
need to have a global scope. Make them static.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1259734990-9034-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
498657a478 incorrectly assumed
that preempt wasn't disabled around context_switch() and thus
was fixing imaginary problem. It also broke KVM because it
depended on ->sched_in() to be called with irq enabled so that
it can do smp calls from there.
Revert the incorrect commit and add comment describing different
contexts under with the two callbacks are invoked.
Avi: spotted transposed in/out in the added comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: peterz@infradead.org
Cc: efault@gmx.de
Cc: rusty@rustcorp.com.au
LKML-Reference: <1259726212-30259-2-git-send-email-tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Keyboard handler should not attempt to traverse handler->h_list on
its own, without any locking, otherwise it races with registering
and unregistering of input handles which leads to crashes.
Introduce input_handler_for_each_handle() helper that allows safely
iterate over all handles attached to a particular handler and switch
keyboard handler to use it.
Reported-by: Jim Paradis <jparadis@redhat.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
No that all of the callers have been updated to set fields in
struct pernet_operations, and simplified to let the network
namespace core handle the allocation and freeing of the storage
for them, remove the surpurpflous methods and update the docs
to the new style.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To get the full benefit of batched network namespace cleanup netowrk
device deletion needs to be performed by the generic code. When
using register_pernet_gen_device and freeing the data in exit_net
it is impossible to delay allocation until after exit_net has called
as the device uninit methods are no longer safe.
To correct this, and to simplify working with per network namespace data
I have moved allocation and deletion of per network namespace data into
the network namespace core. The core now frees the data only after
all of the network namespace exit routines have run.
Now it is only required to set the new fields .id and .size
in the pernet_operations structure if you want network namespace
data to be managed for you automatically.
This makes the current register_pernet_gen_device and
register_pernet_gen_subsys routines unnecessary. For the moment
I have left them as compatibility wrappers in net_namespace.h
They will be removed once all of the users have been updated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is fairly common to kill several network namespaces at once. Either
because they are nested one inside the other or because they are cooperating
in multiple machine networking experiments. As the network stack control logic
does not parallelize easily batch up multiple network namespaces existing
together.
To get the full benefit of batching the virtual network devices to be
removed must be all removed in one batch. For that purpose I have added
a loop after the last network device operations have run that batches
up all remaining network devices and deletes them.
An extra benefit is that the reorganization slightly shrinks the size
of the per network namespace data structures replaceing a work_struct
with a list_head.
In a trivial test with 4K namespaces this change reduced the cost of
a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
(at 60% cpu). The bulk of that 44s was spent in inet_twsk_purge.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I will need this shortly to implement network namespace shutdown
batching. For sanity sake network devices should be removed in
the reverse order they were created in.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The motivation for an additional notifier in batched netdevice
notification (rt_do_flush) only needs to be called once per batch not
once per namespace.
For further batching improvements I need a guarantee that the
netdevices are unregistered in order allowing me to unregister an all
of the network devices in a network namespace at the same time with
the guarantee that the loopback device is really and truly
unregistered last.
Additionally it appears that we moved the route cache flush after
the final synchronize_net, which seems wrong and there was no
explanation. So I have restored the original location of the final
synchronize_net.
Cc: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In current vblank-wait implementation, if we turn off VGA output,
drm_wait_vblank will still wait on the disabled pipe until timeout,
because vblank on the pipe is assumed be enabled. This would cause
slow system response on some system such as moblin.
This patch resolve the issue by adding a drm helper function
drm_vblank_off which explicitly clear vblank_enabled[crtc], wake up
any waiting queue and save last vblank counter before turning off
crtc. It also slightly change drm_vblank_get to ensure that we will
will return immediately if trying to wait on a disabled pipe.
Signed-off-by: Li Peng <peng.li@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[anholt: hand-applied for conflicts with overlay changes]
Signed-off-by: Eric Anholt <eric@anholt.net>
Add a GETPARAM request for checking if page flipping is supported.
Useful for the 2D driver to enable the flipping path.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
We don't actually know which frame number the flip will complete on, so
userspace needs a specific flip notification to tell it when the last flip
completed.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
use only one prof_sysenter_enable() instead of
prof_sysenter_enable_##sname()
use only one prof_sysenter_disable() instead of
prof_sysenter_disable_##sname()
use only one prof_sysexit_enable() instead of
prof_sysexit_enable_##sname()
use only one prof_sysexit_disable() instead of
prof_sysexit_disable_##sname()
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D2A1.8060304@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use only one init_syscall_trace instead of
many init_enter_##sname()/init_exit_##sname()
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D29B.6090708@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add syscall_nr field to struct syscall_metadata,
it helps us to get syscall number easier.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D293.6090800@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use ->enter_event->id instead of ->enter_id
use ->exit_event->id instead of ->exit_id
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D288.7030001@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Set event_enter_##sname->data to its metadata,
it makes codes simpler.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B14D282.7050709@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>