linux

Author	SHA1	Message	Date
Joerg Roedel	3a18404cd9	iommu/amd: Propagate errors from amd_iommu_init_api This function can fail. Propagate any errors back to the initialization state machine. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:42:24 +02:00
Joerg Roedel	2870b0a491	iommu/amd: Remove unused fields from struct dma_ops_domain The list_head and target_dev members are not used anymore. Remove them. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:42:24 +02:00
Joerg Roedel	343e9cac9c	iommu/amd: Get rid of device_dma_ops_init() With device intialization done in the add_device call-back now there is no reason for this function anymore. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:42:23 +02:00
Joerg Roedel	07ee86948c	iommu/amd: Put IOMMUv2 devices in a direct mapped domain A device that might be used for HSA needs to be in a direct mapped domain so that all DMA-API mappings stay alive when the IOMMUv2 stack is used. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:42:23 +02:00
Joerg Roedel	07f643a35d	iommu/amd: Support IOMMU_DOMAIN_IDENTITY type allocation Add support to allocate direct mapped domains through the IOMMU-API. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:42:22 +02:00
Joerg Roedel	0bb6e243d7	iommu/amd: Support IOMMU_DOMAIN_DMA type allocation This enables allocation of DMA-API default domains from the IOMMU core and switches allocation of domain dma-api domain to the IOMMU core too. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:42:22 +02:00
Joerg Roedel	aafd8ba0ca	iommu/amd: Implement add_device and remove_device Implement these two iommu-ops call-backs to make use of the initialization and notifier features of the iommu core. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:42:21 +02:00
Joerg Roedel	063071dff5	iommu/amd: Use default domain if available for DMA-API Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:42:21 +02:00
Joerg Roedel	35cf248f88	iommu/amd: Implement dm_region call-backs Add the get_dm_regions and put_dm_regions callbacks to the iommu_ops of the AMD IOMMU driver. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:42:20 +02:00
Geoff Levand	7161d18f66	net/ps3_gelic: Fix build error with DEBUG When the DEBUG preprocessor macro is defined the ps3_gelic_net driver build fails due to an undeclared routine gelic_descr_get_status(). This problem was introduced during the code cleanup of commit `6b0c21cede` (net: Fix p3_gelic_net sparse warnings), which re-arranged the ordering of some of the gelic routines. This change just moves the gelic_descr_get_status() routine up in the ps3_gelic_net.c source file. There is no functional change. Fixes build errors like these: drivers/net/ethernet/toshiba/ps3_gelic_net.c: error: implicit declaration of function gelic_descr_get_status Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:38:12 -07:00
Hadar Hen Zion	a4244b0cf5	net/ethtool: Add current supported tunable options Add strings array of the current supported tunable options. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:36:37 -07:00
Anton Blanchard	238abecde8	powerpc: Don't use gcc specific options on clang We have code to choose between several options, eg. -mabi=elfv2 vs -mcall-aixdesc, and -mcmodel=medium vs -mminimal-toc. But these are all GCC specific, so use cc-option on all of them. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 17:33:05 +10:00
Anton Blanchard	92d6cf2dab	powerpc: Don't use -mno-strict-align on clang We added -mno-strict-align in commit `f036b36819` (powerpc: Work around little endian gcc bug) to fix gcc bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57134 Clang doesn't understand it. We need to use a conditional because we can't use the simpler call cc-option here. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 17:33:05 +10:00
Anton Blanchard	a50a862e74	powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it These options are not recognised on LLVM, so use call cc-option to check for support. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 17:33:05 +10:00
Anton Blanchard	1fb3f5a7ca	powerpc: Only use -mabi=altivec if toolchain supports it The -mabi=altivec option is not recognised on LLVM, so use call cc-option to check for support. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 17:33:05 +10:00
Anton Blanchard	b91c1e3e7a	powerpc: Fix duplicate const clang warning in user access code We see a large number of duplicate const errors in the user access code when building with llvm/clang: include/linux/pagemap.h:576:8: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier] ret = __get_user(c, uaddr); The problem is we are doing const __typeof__(*(ptr)), which will hit the warning if ptr is marked const. Removing const does not seem to have any effect on GCC code generation. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 17:33:05 +10:00
David S. Miller	de84725403	Merge branch 'broadcom-MDIO-turn-around' Florian Fainelli says: ==================== net: broadcom MDIO support for broken turn-around These two patches update the GENET and UniMAC MDIO controllers to deal with PHYs that are known to have a broken turn-around bug (e.g: BCM53125 and others) This utilizes the infrastructure that code recently added to do that in 'net-next'. Note that the changes look nearly identical and I will try to address the MDIO code duplication between GENET and UniMAC in a future patch series. Changes in v2: - remove brcmphy.h include in mdio-bcm-unimac.c - use the same comment as with GENET's MDIO read function ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:32:21 -07:00
Florian Fainelli	1a3f4e83bc	net: phy: mdio-bcm-unimac: handle broken turn-around for specific PHYs Some Ethernet PHYs/switches such as Broadcom's BCM53125 have a hardware bug which makes them not release the MDIO line during turn-around time. This gets flagged by the UniMAC MDIO controller as a read failure, and we fail the read transaction. Check the MDIO bus phy_ignore_ta_mask bitmask for the PHY we are reading from and if it is listed in this bitmask, ignore the read failure and proceed with returning the data we read out of the controller. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:32:21 -07:00
Florian Fainelli	9d3366e95d	net: bcmgenet: handle broken turn-around for specific PHYs Some Ethernet PHYs/switches such as Broadcom's BCM53125 have a hardware bug which makes them not release the MDIO line during turn-around time. This gets flagged by the GENET MDIO controller as a read failure, and we fail the read transaction. Check the MDIO bus phy_ignore_ta_mask bitmask for the PHY we are reading from and if it is listed in this bitmask, ignore the read failure and proceed with returning the data we read out of the controller. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:32:20 -07:00
Gustavo Zacarias	1059261e7d	net: phy: davicom: add IDs for DM9161B and C variants Add PHY IDs for Davicom DM9161B and DM9161C variants. Tested with a DM9161C on a custom Atmel-based SAM9X25 board in RMII mode. The DM9161B uses the same model id with just the LSB bit of the version id changing (which is masked out). For all intents and purposes they're the same as the DM9161A with an added GPSI mode and better fabrication process. Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:24:53 -07:00
Sergei Shtylyov	a0d2f20650	Renesas Ethernet AVB PTP clock driver Ethernet AVB device includes the gPTP timer, so we can implement a PTP clock driver. We're doing that in a separate file, with the main Ethernet driver calling the PTP driver's [de]initialization and interrupt handler functions. Unfortunately, the clock seems tightly coupled with the AVB-DMAC, so when that one leaves the operation mode, we have to unregister the PTP clock... :-( Based on the original patches by Masaru Nagai. Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:24:12 -07:00
Sergei Shtylyov	c156633f13	Renesas Ethernet AVB driver proper Ethernet AVB includes an Gigabit Ethernet controller (E-MAC) that is basically compatible with SuperH Gigabit Ethernet E-MAC. Ethernet AVB has a dedicated direct memory access controller (AVB-DMAC) that is a new design compared to the SuperH E-DMAC. The AVB-DMAC is compliant with 3 standards formulated for IEEE 802.1BA: IEEE 802.1AS timing and synchronization protocol, IEEE 802.1Qav real- time transfer, and the IEEE 802.1Qat stream reservation protocol. The driver only supports device tree probing, so the binding document is included in this patch. Based on the original patches by Mitsuhiro Kimura. Signed-off-by: Mitsuhiro Kimura <mitsuhiro.kimura.kc@renesas.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:14:29 -07:00
Kenneth Klette Jonassen	2b0a8c9eee	tcp: add CDG congestion control CAIA Delay-Gradient (CDG) is a TCP congestion control that modifies the TCP sender in order to [1]: o Use the delay gradient as a congestion signal. o Back off with an average probability that is independent of the RTT. o Coexist with flows that use loss-based congestion control, i.e., flows that are unresponsive to the delay signal. o Tolerate packet loss unrelated to congestion. (Disabled by default.) Its FreeBSD implementation was presented for the ICCRG in July 2012; slides are available at http://www.ietf.org/proceedings/84/iccrg.html Running the experiment scenarios in [1] suggests that our implementation achieves more goodput compared with FreeBSD 10.0 senders, although it also causes more queueing delay for a given backoff factor. The loss tolerance heuristic is disabled by default due to safety concerns for its use in the Internet [2, p. 45-46]. We use a variant of the Hybrid Slow start algorithm in tcp_cubic to reduce the probability of slow start overshoot. [1] D.A. Hayes and G. Armitage. "Revisiting TCP congestion control using delay gradients." In Networking 2011, pages 328-341. Springer, 2011. [2] K.K. Jonassen. "Implementing CAIA Delay-Gradient in Linux." MSc thesis. Department of Informatics, University of Oslo, 2015. Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: David Hayes <davihay@ifi.uio.no> Cc: Andreas Petlund <apetlund@simula.no> Cc: Dave Taht <dave.taht@bufferbloat.net> Cc: Nicolas Kuhn <nicolas.kuhn@telecom-bretagne.eu> Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:09:12 -07:00
Kenneth Klette Jonassen	7782ad8bb5	tcp: export tcp_enter_cwr() Upcoming tcp_cdg uses tcp_enter_cwr() to initiate PRR. Export this function so that CDG can be compiled as a module. Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: David Hayes <davihay@ifi.uio.no> Cc: Andreas Petlund <apetlund@simula.no> Cc: Dave Taht <dave.taht@bufferbloat.net> Cc: Nicolas Kuhn <nicolas.kuhn@telecom-bretagne.eu> Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-11 00:09:12 -07:00
Joerg Roedel	d290f1e70d	iommu: Introduce iommu_request_dm_for_dev() This function can be called by an IOMMU driver to request that a device's default domain is direct mapped. Signed-off-by: Joerg Roedel <jroedel@suse.de>	2015-06-11 09:01:55 +02:00
Scott Feldman	af201f723f	switchdev: fix handling for drivers not supporting IPv4 fib add/del ops If CONFIG_NET_SWITCHDEV is enabled, but port driver does not implement support for IPv4 FIB add/del ops, don't fail route add/del offload operations. Route adds will not be marked as OFFLOAD. Routes will be installed in the kernel FIB, as usual. This was report/fixed by Florian when testing DSA driver with net-next on devices with L2 offload support but no L3 offload support. What he reported was an initial route installed from DHCP client would fail (route not installed to kernel FIB). This was triggering the setting of ipv4.fib_offload_disabled, which would disable route offloading after the first failure. So subsequent attempts to install the route would succeed. There is follow-on work/discussion to address the handling of route install failures, but for now, let's differentiate between no support and failed support. Reported-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:56:34 -07:00
Govindarajulu Varadarajan	8b13b4e0bc	enic: fix memory leak in rq_clean When incoming packet qualifies for rx_copybreak, we copy the data to newly allocated skb. We do not free/unmap the original buffer. At this point driver assumes this buffer is unallocated. When enic_rq_alloc_buf() is called for buffer allocation, it checks if buf->os_buf is NULL. If its not NULL that means buffer can be re-used. When vnic_rq_clean() is called for freeing all rq buffers, and if the rx_copybreak reused buffer falls outside the used desc, we do not free the buffer. The following trace is observer when dma-debug is enabled. Fix is to walk through complete ring and clean if buffer is present. [ 40.555386] ------------[ cut here ]------------ [ 40.555396] WARNING: CPU: 0 PID: 491 at lib/dma-debug.c:971 dma_debug_device_change+0x188/0x1f0() [ 40.555400] pci 0000:06:00.0: DMA-API: device driver has pending DMA allocations while released from device [count=4] One of leaked entries details: [device address=0x00000000ff4cc040] [size=9018 bytes] [mapped with DMA_FROM_DEVICE] [mapped as single] [ 40.555402] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 dns_resolver coretemp intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw joydev mousedev gf128mul hid_generic glue_helper mgag200 usbhid ttm hid drm_kms_helper drm ablk_helper syscopyarea sysfillrect sysimgblt i2c_algo_bit i2c_core iTCO_wdt cryptd mac_hid evdev pcspkr sb_edac edac_core tpm_tis iTCO_vendor_support ipmi_si wmi tpm ipmi_msghandler shpchp lpc_ich processor acpi_power_meter hwmon button ac sch_fq_codel nfs lockd grace sunrpc fscache sd_mod ehci_pci ehci_hcd megaraid_sas usbcore scsi_mod usb_common enic(-) crc32c_generic crc32c_intel btrfs xor raid6_pq ext4 crc16 mbcache jbd2 [ 40.555467] CPU: 0 PID: 491 Comm: rmmod Not tainted 4.1.0-rc7-ARCH-01305-gf59b71f #118 [ 40.555469] Hardware name: Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4, BIOS B200M4.2.2.2.23.061220140128 06/12/2014 [ 40.555471] 0000000000000000 00000000e2f8a5b7 ffff880275f8bc48 ffffffff8158d6f0 [ 40.555474] 0000000000000000 ffff880275f8bca0 ffff880275f8bc88 ffffffff8107b04a [ 40.555477] ffff8802734e0000 0000000000000004 ffff8804763fb3c0 ffff88027600b650 [ 40.555480] Call Trace: [ 40.555488] [<ffffffff8158d6f0>] dump_stack+0x4f/0x7b [ 40.555492] [<ffffffff8107b04a>] warn_slowpath_common+0x8a/0xc0 [ 40.555494] [<ffffffff8107b0d5>] warn_slowpath_fmt+0x55/0x70 [ 40.555498] [<ffffffff812fa408>] dma_debug_device_change+0x188/0x1f0 [ 40.555503] [<ffffffff8109aaef>] notifier_call_chain+0x4f/0x80 [ 40.555506] [<ffffffff8109aecb>] __blocking_notifier_call_chain+0x4b/0x70 [ 40.555510] [<ffffffff8109af06>] blocking_notifier_call_chain+0x16/0x20 [ 40.555514] [<ffffffff813f8066>] __device_release_driver+0xf6/0x120 [ 40.555518] [<ffffffff813f8b08>] driver_detach+0xc8/0xd0 [ 40.555523] [<ffffffff813f7c59>] bus_remove_driver+0x59/0xe0 [ 40.555527] [<ffffffff813f93a0>] driver_unregister+0x30/0x70 [ 40.555534] [<ffffffff8131532d>] pci_unregister_driver+0x2d/0xa0 [ 40.555542] [<ffffffffa0200ec2>] enic_cleanup_module+0x10/0x14e [enic] [ 40.555547] [<ffffffff8110158f>] SyS_delete_module+0x1cf/0x280 [ 40.555551] [<ffffffff811e284e>] ? ____fput+0xe/0x10 [ 40.555554] [<ffffffff810980ec>] ? task_work_run+0xbc/0xf0 [ 40.555558] [<ffffffff815930ee>] system_call_fastpath+0x12/0x71 [ 40.555561] ---[ end trace 4988cadc77c2b236 ]--- [ 40.555562] Mapped at: [ 40.555563] [<ffffffff812fa865>] debug_dma_map_page+0x95/0x150 [ 40.555566] [<ffffffffa01f4a88>] enic_rq_alloc_buf+0x1b8/0x360 [enic] [ 40.555570] [<ffffffffa01f7658>] enic_open+0xf8/0x820 [enic] [ 40.555574] [<ffffffff8148d50e>] __dev_open+0xce/0x150 [ 40.555579] [<ffffffff8148d851>] __dev_change_flags+0xa1/0x170 Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:42:39 -07:00
Govindarajulu Varadarajan	19b596bda1	enic: check return value for stat dump We do not check the return value of enic_dev_stats_dump(). If allocation fails, we will hit NULL pointer reference. Return only if memory allocation fails. For other failures, we return the previously recorded values. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:42:39 -07:00
Govindarajulu Varadarajan	6286e82850	enic: unlock napi busy poll before unmasking intr There is a small window between vnic_intr_unmask() and enic_poll_unlock_napi(). In this window if an irq occurs and napi is scheduled on different cpu, it tries to acquire enic_poll_lock_napi() and hits the following WARN_ON message. Fix is to unlock napi_poll before unmasking the interrupt. [ 781.121746] ------------[ cut here ]------------ [ 781.121789] WARNING: CPU: 1 PID: 0 at drivers/net/ethernet/cisco/enic/vnic_rq.h:228 enic_poll_msix_rq+0x36a/0x3c0 [enic]() [ 781.121834] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 dns_resolver coretemp intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel mgag200 ttm drm_kms_helper joydev aes_x86_64 lrw drm gf128mul mousedev glue_helper sb_edac ablk_helper iTCO_wdt iTCO_vendor_support evdev ipmi_si syscopyarea sysfillrect sysimgblt i2c_algo_bit i2c_core edac_core lpc_ich mac_hid cryptd pcspkr ipmi_msghandler shpchp tpm_tis acpi_power_meter tpm wmi processor hwmon button ac sch_fq_codel nfs lockd grace sunrpc fscache hid_generic usbhid hid ehci_pci ehci_hcd sd_mod megaraid_sas usbcore scsi_mod usb_common enic crc32c_generic crc32c_intel btrfs xor raid6_pq ext4 crc16 mbcache jbd2 [ 781.122176] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.0-rc6-ARCH-00040-gc46a024-dirty #106 [ 781.122210] Hardware name: Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4, BIOS B200M4.2.2.2.23.061220140128 06/12/2014 [ 781.122252] 0000000000000000 bddbbc9d655ec96e ffff880277e43da8 ffffffff81583fe8 [ 781.122286] 0000000000000000 0000000000000000 ffff880277e43de8 ffffffff8107acfa [ 781.122319] ffff880272c01000 ffff880273f18000 ffff880273f1a100 0000000000000000 [ 781.122352] Call Trace: [ 781.122364] <IRQ> [<ffffffff81583fe8>] dump_stack+0x4f/0x7b [ 781.122399] [<ffffffff8107acfa>] warn_slowpath_common+0x8a/0xc0 [ 781.122425] [<ffffffff8107ae2a>] warn_slowpath_null+0x1a/0x20 [ 781.122455] [<ffffffffa01fa9ca>] enic_poll_msix_rq+0x36a/0x3c0 [enic] [ 781.122487] [<ffffffff8148525a>] net_rx_action+0x22a/0x370 [ 781.122512] [<ffffffff8107ed3d>] __do_softirq+0xed/0x2d0 [ 781.122537] [<ffffffff8107f06e>] irq_exit+0x7e/0xa0 [ 781.122560] [<ffffffff8158c424>] do_IRQ+0x64/0x100 [ 781.122582] [<ffffffff8158a42e>] common_interrupt+0x6e/0x6e [ 781.122605] <EOI> [<ffffffff810bd331>] ? cpu_startup_entry+0x121/0x480 [ 781.122638] [<ffffffff810bd2fc>] ? cpu_startup_entry+0xec/0x480 [ 781.122667] [<ffffffff810f2ed3>] ? clockevents_register_device+0x113/0x1f0 [ 781.122698] [<ffffffff81050ab6>] start_secondary+0x196/0x1e0 [ 781.122723] ---[ end trace cec2e9dd3af7b9db ]--- Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:42:39 -07:00
David S. Miller	1531407c81	Merge branch 'brcm-pseudo-phy-addr' Florian Fainelli says: ==================== net: phy: broadcom: define pseudo-PHY address This patch series converts existing in-tree users of the Broadcom pseudo-PHY address (30) used to configure MDIO-connected switches to share a constant in a shared header files. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:33:59 -07:00
Florian Fainelli	aafc66f106	net: dsa: bcm_sf2: Utilize BRCM_PSEUDO_PHY_ADDR Utilize the newly introduced BRCM_PSEUDO_PHY_ADDR constant from brcmphy.h instead of open-coding the Broadcom Ethernet switches pseudo-PHY address (30). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:33:58 -07:00
Florian Fainelli	4447d2adfa	bgmac: Utilize BRCM_PSEUDO_PHY_ADDR What BGMAC defines as BGMAC_PHY_NOREGS is in fact the Broadcom Ethernet switches' pseudo-PHY address (30), utilize the newly introduced constant from brcmphy.h Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:33:58 -07:00
Florian Fainelli	2729b42708	b44: Utilize BRCM_PSEUDO_PHY_ADDR What B44 has been locally using as B44_PHY_ADDR_NO_LOCAL_PHY is in fact the Broadcom Ethernet switches pseudo-PHY address (30). Update the header to use the newly introduced constant and update comments so they are within 80 columns and consistent. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:33:58 -07:00
Florian Fainelli	8bc84b7926	net: phy: broadcom: define Broadcom pseudo-PHY address in brcmphy.h Define the pseudo-PHY address (30) which is used by all Broadcom Ethernet switches in a shared header file. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:33:58 -07:00
Florian Fainelli	4f822c625f	net: phy: broadcom: include phy.h for brcmphy.h We utilize inline functions from the PHY library, make sure that we do include phy.h in brcmphy.h in order for the code including brcmphy.h not to have to resolve this inclusion dependency. Fixes: `705314797b` ("net: phy: broadcom: move shadow 0x1C register accessors to brcmphy.h") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:33:58 -07:00
Joerg Roedel	94fb933418	x86/crash: Allocate enough low memory when crashkernel=high When the crash kernel is loaded above 4GiB in memory, the first kernel allocates only 72MiB of low-memory for the DMA requirements of the second kernel. On systems with many devices this is not enough and causes device driver initialization errors and failed crash dumps. Testing by SUSE and Redhat has shown that 256MiB is a good default value for now and the discussion has lead to this value as well. So set this default value to 256MiB to make sure there is enough memory available for DMA. Signed-off-by: Joerg Roedel <jroedel@suse.de> [ Reflow comment. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1433500202-25531-4-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-11 08:28:39 +02:00
Joerg Roedel	186dfc9d69	x86/swiotlb: Try coherent allocations with __GFP_NOWARN When we boot a kdump kernel in high memory, there is by default only 72MB of low memory available. The swiotlb code takes 64MB of it (by default) so that there are only 8MB left to allocate from. On systems with many devices this causes page allocator warnings from dma_generic_alloc_coherent(): systemd-udevd: page allocation failure: order:0, mode:0x280d4 CPU: 0 PID: 197 Comm: systemd-udevd Tainted: G W 3.12.28-4-default #1 Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 ffff8800781335e0 ffffffff8150b1db 00000000000280d4 ffffffff8113af90 0000000000000000 0000000000000000 ffff88007efdbb00 0000000100000000 0000000000000000 0000000000000000 0000000000000000 0000000000000001 Call Trace: dump_trace+0x7d/0x2d0 show_stack_log_lvl+0x94/0x170 show_stack+0x21/0x50 dump_stack+0x41/0x51 warn_alloc_failed+0xf0/0x160 __alloc_pages_slowpath+0x72f/0x796 __alloc_pages_nodemask+0x1ea/0x210 dma_generic_alloc_coherent+0x96/0x140 x86_swiotlb_alloc_coherent+0x1c/0x50 ttm_dma_pool_alloc_new_pages+0xab/0x320 [ttm] ttm_dma_populate+0x3ce/0x640 [ttm] ttm_tt_bind+0x36/0x60 [ttm] ttm_bo_handle_move_mem+0x55f/0x5c0 [ttm] ttm_bo_move_buffer+0x105/0x130 [ttm] ttm_bo_validate+0xc1/0x130 [ttm] ttm_bo_init+0x24b/0x400 [ttm] radeon_bo_create+0x16c/0x200 [radeon] radeon_ring_init+0x11e/0x2b0 [radeon] r100_cp_init+0x123/0x5b0 [radeon] r100_startup+0x194/0x230 [radeon] r100_init+0x223/0x410 [radeon] radeon_device_init+0x6af/0x830 [radeon] radeon_driver_load_kms+0x89/0x180 [radeon] drm_get_pci_dev+0x121/0x2f0 [drm] local_pci_probe+0x39/0x60 pci_device_probe+0xa9/0x120 driver_probe_device+0x9d/0x3d0 __driver_attach+0x8b/0x90 bus_for_each_dev+0x5b/0x90 bus_add_driver+0x1f8/0x2c0 driver_register+0x5b/0xe0 do_one_initcall+0xf2/0x1a0 load_module+0x1207/0x1c70 SYSC_finit_module+0x75/0xa0 system_call_fastpath+0x16/0x1b 0x7fac533d2788 After these warnings the code enters a fall-back path and allocated directly from the swiotlb aperture in the end. So remove these warnings as this is not a fatal error. Signed-off-by: Joerg Roedel <jroedel@suse.de> [ Simplify, reflow comment. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1433500202-25531-3-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-11 08:28:38 +02:00
Joerg Roedel	94cc81f9a8	swiotlb: Warn on allocation failure in swiotlb_alloc_coherent() Print a warning when all allocation tries have been failed and the function is about to return NULL. This prepares for calling the function with __GFP_NOWARN to suppress allocation failure warnings before all fall-backs have failed - which we'll do to improve kdump behavior. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1433500202-25531-2-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-11 08:28:38 +02:00
Eric Dumazet	f9c2ff22bb	net: tcp: dctcp_update_alpha() fixes. dctcp_alpha can be read by from dctcp_get_info() without synchro, so use WRITE_ONCE() to prevent compiler from using dctcp_alpha as a temporary variable. Also, playing with small dctcp_shift_g (like 1), can expose an overflow with 32bit values shifted 9 times before divide. Use an u64 field to avoid this problem, and perform the divide only if acked_bytes_ecn is not zero. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:28:33 -07:00
Mel Gorman	5d75361027	net, swap: Remove a warning and clarify why sk_mem_reclaim is required when deactivating swap Jeff Layton reported the following; [ 74.232485] ------------[ cut here ]------------ [ 74.233354] WARNING: CPU: 2 PID: 754 at net/core/sock.c:364 sk_clear_memalloc+0x51/0x80() [ 74.234790] Modules linked in: cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xfs libcrc32c snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device nfsd snd_pcm snd_timer snd e1000 ppdev parport_pc joydev parport pvpanic soundcore floppy serio_raw i2c_piix4 pcspkr nfs_acl lockd virtio_balloon acpi_cpufreq auth_rpcgss grace sunrpc qxl drm_kms_helper ttm drm virtio_console virtio_blk virtio_pci ata_generic virtio_ring pata_acpi virtio [ 74.243599] CPU: 2 PID: 754 Comm: swapoff Not tainted 4.1.0-rc6+ #5 [ 74.244635] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 74.245546] 0000000000000000 0000000079e69e31 ffff8800d066bde8 ffffffff8179263d [ 74.246786] 0000000000000000 0000000000000000 ffff8800d066be28 ffffffff8109e6fa [ 74.248175] 0000000000000000 ffff880118d48000 ffff8800d58f5c08 ffff880036e380a8 [ 74.249483] Call Trace: [ 74.249872] [<ffffffff8179263d>] dump_stack+0x45/0x57 [ 74.250703] [<ffffffff8109e6fa>] warn_slowpath_common+0x8a/0xc0 [ 74.251655] [<ffffffff8109e82a>] warn_slowpath_null+0x1a/0x20 [ 74.252585] [<ffffffff81661241>] sk_clear_memalloc+0x51/0x80 [ 74.253519] [<ffffffffa0116c72>] xs_disable_swap+0x42/0x80 [sunrpc] [ 74.254537] [<ffffffffa01109de>] rpc_clnt_swap_deactivate+0x7e/0xc0 [sunrpc] [ 74.255610] [<ffffffffa03e4fd7>] nfs_swap_deactivate+0x27/0x30 [nfs] [ 74.256582] [<ffffffff811e99d4>] destroy_swap_extents+0x74/0x80 [ 74.257496] [<ffffffff811ecb52>] SyS_swapoff+0x222/0x5c0 [ 74.258318] [<ffffffff81023f27>] ? syscall_trace_leave+0xc7/0x140 [ 74.259253] [<ffffffff81798dae>] system_call_fastpath+0x12/0x71 [ 74.260158] ---[ end trace 2530722966429f10 ]--- The warning in question was unnecessary but with Jeff's series the rules are also clearer. This patch removes the warning and updates the comment to explain why sk_mem_reclaim() may still be called. [jlayton: remove if (sk->sk_forward_alloc) conditional. As Leon points out that it's not needed.] Cc: Leon Romanovsky <leon@leon.nu> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 23:02:31 -07:00
David S. Miller	1edaa7e8a7	For this round we mostly have fixes: * mesh fixes from Alexis Green and Chun-Yeow Yeoh, * a documentation fix from Jakub Kicinski, * a missing channel release (from Michal Kazior), * a fix for a signal strength reporting bug (from Sara Sharon), * handle deauth while associating (myself), * don't report mangled TX SKB back to userspace for status (myself), * handle aggregation session timeouts properly in fast-xmit (myself) However, there are also a few cleanups and one big change that affects all drivers (and that required me to pull in your tree) to change the mac80211 HW flags to use an unsigned long bitmap so that we can extend them more easily - we're running out of flags even with a cleanup to remove the two unused ones. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJVeEQ1AAoJEDBSmw7B7bqrebcP/3v7I2ZXAeHag2W4hdD4YH6W tuKfs3JKW3GDh84l2AJs2JBpFxR6Tk0Z7zGKrPLzkBTkkJkSLgKuUKR0+YQU6PYH VfZ2NkdIHEqouLgMWxGGlp6suqp2yYD9tiIUroICXZ6aFm5trQuZgzv5ePI+lhmX cWUYCawE2tcpVdg0NJsFExeCJhw81e/Bet1LCGHo0asWNpIK7phMdltzD7e4tgQS 4q475FCIkWxbxKgJRrRkz8J7grsjK1wf2W3acOxKMaoVBeqJVW5BWDrTgo0aDPts qQ8n8t1s9o/jKQIvaz3RyjkQgX8T4vCMqkouLF4jJOThKIsUSi3Fvm9oKcMg4YhA Ju5QWfbCBFhpLZeBzWzKyePTnDru1XDFFVdIATLONKTVg1modzFAs3j5gb4Z3Wtg VYLoLWWpRtHKd9pzfZMhyWq64Xb8C+qlyQHr4r4QRm9ADz0Jq+OCh0rTFt+/bncM CHxnf0VS9hEOFk0+TxFqi2yXOnv2uMgcN+jnGkEs4QuLfv9ML1Eb23ZjDoHxd1uq 1Yd4R8IDEY/KU6UJMwksz+gV/ekoB32eAhw56pxehgAMuZL4OgNvmeAQHx7Jq9it 0/OfAK2BSNH8odqYQbpg89C8keqSInMwUhFyRhyMJAWSKiPRHypsDBWxMKGJIssI 3mB4d/go+RP1AvZnazeF =2wTw -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For this round we mostly have fixes: * mesh fixes from Alexis Green and Chun-Yeow Yeoh, * a documentation fix from Jakub Kicinski, * a missing channel release (from Michal Kazior), * a fix for a signal strength reporting bug (from Sara Sharon), * handle deauth while associating (myself), * don't report mangled TX SKB back to userspace for status (myself), * handle aggregation session timeouts properly in fast-xmit (myself) However, there are also a few cleanups and one big change that affects all drivers (and that required me to pull in your tree) to change the mac80211 HW flags to use an unsigned long bitmap so that we can extend them more easily - we're running out of flags even with a cleanup to remove the two unused ones. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 22:49:49 -07:00
Stephen Smalley	37a9a8df8c	net/unix: support SCM_SECURITY for stream sockets SCM_SECURITY was originally only implemented for datagram sockets, not for stream sockets. However, SCM_CREDENTIALS is supported on Unix stream sockets. For consistency, implement Unix stream support for SCM_SECURITY as well. Also clean up the existing code and get rid of the superfluous UNIXSID macro. Motivated by https://bugzilla.redhat.com/show_bug.cgi?id=1224211, where systemd was using SCM_CREDENTIALS and assumed wrongly that SCM_SECURITY was also supported on Unix stream sockets. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 22:49:20 -07:00
Vaishali Thakkar	bae23b6840	atm: idt77105: Use setup_timer Use the timer API function setup_timer instead of structure field assignments to initialize a timer. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @change@ expression e1, e2, a; @@ -init_timer(&e1); +setup_timer(&e1, a, 0UL); ... when != a = e2 -e1.function = a; Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 22:46:34 -07:00
Raghu Vatsavayi	f21fb3ed36	Add support of Cavium Liquidio ethernet adapters Following patch V8 adds support for Cavium Liquidio pci express based 10Gig ethernet adapters. 1) Consolidated all debug macros to either call dev_* or netdev_* macros directly, feedback from previous patch. 2) Changed soft commands to avoid crash when running in interrupt context. 3) Fixed link status not reflecting correct status when NetworkManager is running. Added MODULE_FIRMWARE declarations. Following were the previous patches. Patch V7: 1) Minor comments from v6 release regarding debug statements. 2) Fix for large multicast lists. 3) Fixed lockup issue if port initialization fails. 4) Enabled MSI by default. https://patchwork.ozlabs.org/patch/464441/ Patch V6: 1) Addressed the uint64 vs u64 issue, feedback from previous patch. 2) Consolidated some receive processing routines. 3) Removed link status polling method. https://patchwork.ozlabs.org/patch/459514/ Patch V5: Based on the feedback from earlier patches with regards to consolidation of common functions like device init, register programming for cn66xx and cn68xx devices. https://patchwork.ozlabs.org/patch/438979/ Patch V4: Following were the changes based on the feedback from earlier patch: 1) Added mmiowb while synchronizing queue updates and other hw interactions. 2) Statistics will now be incremented non-atomically per each ring. liquidio_get_stats will add stats of each ring while reporting the total statistics counts. 3) Modified liquidio_ioctl to return proper return codes. 4) Modified device naming to use standard Ethernet naming. 5) Global function names in the driver will have lio_/liquidio_/octeon_ prefix. 6) Ethtool related changes for: Removed redundant stats and jiffies. Use default ethtool handler of link status. Speed setting will make use of ethtool_cmd_speed_set. 7) Added checks for pci_map_* return codes. 8) Check for signals while waiting in interruptible mode https://patchwork.ozlabs.org/patch/435073/ Patch v3: Implemented feedback from previous patch like: Removed NAPI Config and DEBUG config options, added BQL and xmit_more support. https://patchwork.ozlabs.org/patch/422749/ Patch V2: Implemented feedback from previous patch. https://patchwork.ozlabs.org/patch/413539/ First Patch: https://patchwork.ozlabs.org/patch/412946/ Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com> Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com> Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com> Signed-off-by: Robert Richter <Robert.Richter@caviumnetworks.com> Signed-off-by: Aleksey Makarov <Aleksey.Makarov@caviumnetworks.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-10 22:44:08 -07:00
Alexey Kardashevskiy	e633bc86a9	vfio: powerpc/spapr: Support Dynamic DMA windows This adds create/remove window ioctls to create and remove DMA windows. sPAPR defines a Dynamic DMA windows capability which allows para-virtualized guests to create additional DMA windows on a PCI bus. The existing linux kernels use this new window to map the entire guest memory and switch to the direct DMA operations saving time on map/unmap requests which would normally happen in a big amounts. This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows. Up to 2 windows are supported now by the hardware and by this driver. This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional information such as a number of supported windows and maximum number levels of TCE tables. DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature as we still want to support v2 on platforms which cannot do DDW for the sake of TCE acceleration in KVM (coming soon). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:55 +10:00
Alexey Kardashevskiy	2157e7b82f	vfio: powerpc/spapr: Register memory and define IOMMU v2 The existing implementation accounts the whole DMA window in the locked_vm counter. This is going to be worse with multiple containers and huge DMA windows. Also, real-time accounting would requite additional tracking of accounted pages due to the page size difference - IOMMU uses 4K pages and system uses 4K or 64K pages. Another issue is that actual pages pinning/unpinning happens on every DMA map/unmap request. This does not affect the performance much now as we spend way too much time now on switching context between guest/userspace/host but this will start to matter when we add in-kernel DMA map/unmap acceleration. This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU. New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces 2 new ioctls to register/unregister DMA memory - VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY - which receive user space address and size of a memory region which needs to be pinned/unpinned and counted in locked_vm. New IOMMU splits physical pages pinning and TCE table update into 2 different operations. It requires: 1) guest pages to be registered first 2) consequent map/unmap requests to work only with pre-registered memory. For the default single window case this means that the entire guest (instead of 2GB) needs to be pinned before using VFIO. When a huge DMA window is added, no additional pinning will be required, otherwise it would be guest RAM + 2GB. The new memory registration ioctls are not supported by VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration will require memory to be preregistered in order to work. The accounting is done per the user process. This advertises v2 SPAPR TCE IOMMU and restricts what the userspace can do with v1 or v2 IOMMUs. In order to support memory pre-registration, we need a way to track the use of every registered memory region and only allow unregistration if a region is not in use anymore. So we need a way to tell from what region the just cleared TCE was from. This adds a userspace view of the TCE table into iommu_table struct. It contains userspace address, one per TCE entry. The table is only allocated when the ownership over an IOMMU group is taken which means it is only used from outside of the powernv code (such as VFIO). As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support DDW API), this creates a default DMA window for IODA2 for consistency. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:55 +10:00
Alexey Kardashevskiy	15b244a88e	powerpc/mmu: Add userspace-to-physical addresses translation cache We are adding support for DMA memory pre-registration to be used in conjunction with VFIO. The idea is that the userspace which is going to run a guest may want to pre-register a user space memory region so it all gets pinned once and never goes away. Having this done, a hypervisor will not have to pin/unpin pages on every DMA map/unmap request. This is going to help with multiple pinning of the same memory. Another use of it is in-kernel real mode (mmu off) acceleration of DMA requests where real time translation of guest physical to host physical addresses is non-trivial and may fail as linux ptes may be temporarily invalid. Also, having cached host physical addresses (compared to just pinning at the start and then walking the page table again on every H_PUT_TCE), we can be sure that the addresses which we put into TCE table are the ones we already pinned. This adds a list of memory regions to mm_context_t. Each region consists of a header and a list of physical addresses. This adds API to: 1. register/unregister memory regions; 2. do final cleanup (which puts all pre-registered pages); 3. do userspace to physical address translation; 4. manage usage counters; multiple registration of the same memory is allowed (once per container). This implements 2 counters per registered memory region: - @mapped: incremented on every DMA mapping; decremented on unmapping; initialized to 1 when a region is just registered; once it becomes zero, no more mappings allowe; - @used: incremented on every "register" ioctl; decremented on "unregister"; unregistration is allowed for DMA mapped regions unless it is the very last reference. For the very last reference this checks that the region is still mapped and returns -EBUSY so the userspace gets to know that memory is still pinned and unregistration needs to be retried; @used remains 1. Host physical addresses are stored in vmalloc'ed array. In order to access these in the real mode (mmu off), there is a real_vmalloc_addr() helper. In-kernel acceleration patchset will move it from KVM to MMU code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:54 +10:00
Alexey Kardashevskiy	46d3e1e162	vfio: powerpc/spapr: powerpc/powernv/ioda2: Use DMA windows API in ownership control Before the IOMMU user (VFIO) would take control over the IOMMU table belonging to a specific IOMMU group. This approach did not allow sharing tables between IOMMU groups attached to the same container. This introduces a new IOMMU ownership flavour when the user can not just control the existing IOMMU table but remove/create tables on demand. If an IOMMU implements take/release_ownership() callbacks, this lets the user have full control over the IOMMU group. When the ownership is taken, the platform code removes all the windows so the caller must create them. Before returning the ownership back to the platform code, VFIO unprograms and removes all the tables it created. This changes IODA2's onwership handler to remove the existing table rather than manipulating with the existing one. From now on, iommu_take_ownership() and iommu_release_ownership() are only called from the vfio_iommu_spapr_tce driver. Old-style ownership is still supported allowing VFIO to run on older P5IOC2 and IODA IO controllers. No change in userspace-visible behaviour is expected. Since it recreates TCE tables on each ownership change, related kernel traces will appear more often. This adds a pnv_pci_ioda2_setup_default_config() which is called when PE is being configured at boot time and when the ownership is passed from VFIO to the platform code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:54 +10:00
Alexey Kardashevskiy	0054719386	powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table This adds a way for the IOMMU user to know how much a new table will use so it can be accounted in the locked_vm limit before allocation happens. This stores the allocated table size in pnv_pci_ioda2_get_table_size() so the locked_vm counter can be updated correctly when a table is being disposed. This defines an iommu_table_group_ops callback to let VFIO know how much memory will be locked if a table is created. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:53 +10:00
Alexey Kardashevskiy	c035e37b58	powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE release The existing code programmed TVT#0 with some address and then immediately released that memory. This makes use of pnv_pci_ioda2_unset_window() and pnv_pci_ioda2_set_bypass() which do correct resource release and TVT update. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:53 +10:00

... 104 105 106 107 108 ...

534912 Commits