linux

Author	SHA1	Message	Date
David S. Miller	d4185bbf62	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c Minor conflict between the BCM_CNIC define removal in net-next and a bug fix added to net. Based upon a conflict resolution patch posted by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-10 18:32:51 -05:00
Michael S. Tsirkin	24eb21a148	vhost-net: reduce vq polling on tx zerocopy It seems that to avoid deadlocks it is enough to poll vq before we are going to use the last buffer. This is faster than `c70aa540c7`. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:58 -04:00
Michael S. Tsirkin	eaae8132ef	vhost-net: select tx zero copy dynamically Even when vhost-net is in zero-copy transmit mode, net core might still decide to copy the skb later which is somewhat slower than a copy in user context: data copy overhead is added to the cost of page pin/unpin. The result is that enabling tx zero copy option leads to higher CPU utilization for guest to guest and guest to host traffic. To fix this, suppress zero copy tx after a given number of packets triggered late data copy. Re-enable periodically to detect workload changes. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:58 -04:00
Michael S. Tsirkin	b211616d71	vhost: move -net specific code out Zerocopy handling code is vhost-net specific. Move it from vhost.c/vhost.h out to net.c Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:58 -04:00
Michael S. Tsirkin	c4fcb586c3	vhost: track zero copy failures using DMA length This will be used to disable zerocopy when error rate is high. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Michael S. Tsirkin	70e4cb9aaf	vhost-net: cleanup macros for DMA status tracking Better document macros for DMA tracking. Add an explicit one for DMA in progress instead of relying on user supplying len != 1. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Michael S. Tsirkin	e19d6763cc	skb: report completion status for zero copy skbs Even if skb is marked for zero copy, net core might still decide to copy it later which is somewhat slower than a copy in user context: besides copying the data we need to pin/unpin the pages. Add a parameter reporting such cases through zero copy callback: if this happens a lot, device can take this into account and switch to copying in user context. This patch updates all users but ignores the passed value for now: it will be used by follow-up patches. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-02 21:29:57 -04:00
Michael S. Tsirkin	910a578f7e	vhost: fix mergeable bufs on BE hosts We copy head count to a 16 bit field, this works by chance on LE but on BE guest gets 0. Fix it up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Alexander Graf <agraf@suse.de> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-24 23:19:30 -04:00
Linus Torvalds	a188e7e93a	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull scsi target updates from Nicholas Bellinger: "Things have been calm for the most part with no new fabric drivers in flight for v3.7 (we're up to eight now !), so this update is primarily focused on addressing a few long-standing items within target-core and iscsi-target fabric code. The highlights include: - target: Simplify fabric sense data length handling (roland) - qla2xxx: Fix endianness of task management response code (roland) - target: fix truncation of mode data, support zero allocation length (paolo) - target: Properly support zero-length commands in normal processing path (paolo) - iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT PDU (ronnie + nab) - iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode (ronnie + nab) - target/file: Re-enable optional fd_buffered_io=1 operation (nab + hch) - iscsi-target: Add MaxXmitDataSegmenthLength forr target -> initiator MDRSL declaration (nab) - target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough (nab + hch) - tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch + nab) - tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab + hch) The last series for adding a new target_submit_cmd_map_sgls() fabric caller (as requested by hch) that accepts pre-allocated SGL memory (using existing logic), along with converting tcm_loop + tcm_vhost has only been in -next for the last days, but has gotten enough review +testing and is clear enough a mechanical change that I think it's reasonable to merge for -rc1 code. Thanks again to everyone who contributed this round! Extra special thanks to Roland (PureStorage) for tracking down the qla2xxx target TMR response code endian issue, and to Paolo (Redhat) for resolving the long standing zero-length CDB issues within target-core between virtual and pSCSI backends." * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits) iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values iscsit: proper endianess conversions iscsit: use the itt_t abstract type iscsit: add missing endianess conversion in iscsit_check_inaddr_any iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp iscsit: mark various functions static target/iscsi: precedence bug in iscsit_set_dataout_sequence_values() target/usb-gadget: strlen() doesn't count the terminator target/usb-gadget: remove duplicate initialization tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls target: Add control CDB READ payload zero work-around tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength iscsi-target: Add MaxXmitDataSegmentLength connection recovery check iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength iscsi-target: Enable MaxXmitDataSegmentLength operation in login path iscsi-target: Add base MaxXmitDataSegmentLength code target/file: Re-enable optional fd_buffered_io=1 operation ...	2012-10-10 19:52:19 +09:00
Nicholas Bellinger	9f0abc1554	tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls This patch converts tcm_vhost to use target_submit_cmd_map_sgls() for I/O submission and mapping of pre-allocated SGL memory from incoming virtio-scsi SGL memory -> se_cmd descriptors. This includes removing the original open-coded fabric uses of target core callers to support transport_generic_map_mem_to_cmd() between target_setup_cmd_from_cdb() and transport_handle_cdb_direct() logic. It also includes adding a handful of new tcm_vhost_cmnd member + assignments in vhost_scsi_allocate_cmd() used from cmwq process context I/O submission within tcm_vhost_submission_work() (v2: Use renamed target_submit_cmd_map_sgls) Reported-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-10-02 14:16:20 -07:00
Al Viro	cecb46f194	vhost_set_vring(): turn pollstart/pollstop into bool Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:10:12 -04:00
Roland Dreier	9c58b7ddd7	target: Simplify fabric sense data length handling Every fabric driver has to supply a se_tfo->set_fabric_sense_len() method, just so iSCSI can return an offset of 2. However, every fabric driver is already allocating a sense buffer and passing it into the target core, either via transport_init_se_cmd() or target_submit_cmd(). So instead of having iSCSI pass the start of its sense buffer into the core and then later tell the core to skip the first 2 bytes, it seems easier for iSCSI just to do the offset of 2 when it passes the sense buffer into the core. Then we can drop the se_tfo->set_fabric_sense_len() everywhere, and just add a couple of lines of code to iSCSI to set the sense data length to the beginning of the buffer right before it sends it over the network. (nab: Remove .set_fabric_sense_len usage from tcm_qla2xxx_npiv_ops + change transport_get_sense_buffer to follow v3.6-rc6 code w/o ->set_fabric_sense_len usage) Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-09-17 17:12:58 -07:00
Roland Dreier	2ed772b7b9	target: Remove unused target_core_fabric_ops.get_fabric_sense_len method There are no callers of se_tfo->get_fabric_sense_len(), so we should stop having every fabric driver implement it. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-09-17 16:15:47 -07:00
Michael S. Tsirkin	6de7145ca3	tcm_vhost: Fix vhost_scsi_target structure alignment Here TRANSPORT_IQN_LEN is 224, which is a multiple of 4. Since vhost_tpgt is 2 bytes and abi_version is 4, the total size would be 230. But gcc needs struct size be aligned to first field size, which is 4 bytes, so it pads the structure by extra 2 bytes to the total of 232. This padding is very undesirable in an ABI: - it can not be initialized easily - it can not be checked easily - it can leak information between kernel and userspace Simplest solution is probably just to make the padding explicit. (v2: Add check for zero'ed backend->reserved field for VHOST_SCSI_SET_ENDPOINT and VHOST_SCSI_CLEAR_ENDPOINT ops as requested by MST) Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-08-20 14:52:11 -07:00
Nicholas Bellinger	5b7517f814	tcm_vhost: Change vhost_scsi_target->vhost_wwpn to char * This patch changes the vhost_scsi_target->vhost_wwpn[] type used by VHOST_SCSI_* ioctls to 'char *' as requested by Blue Swirl in order to match the latest QEMU vhost-scsi RFC-v3 userspace code. Queuing this up into target-pending/master for a -rc3 PULL. Reported-by: Blue Swirl <blauwirbel@gmail.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-08-16 19:03:13 -07:00
Nicholas Bellinger	101998f6fc	tcm_vhost: Post-merge review changes requested by MST This patch contains the post RFC-v5 (post-merge) changes, this includes: - Add locking comment - Move vhost_scsi_complete_cmd ahead of TFO callbacks in order to drop forward declarations - Drop extra '!= NULL' usage in vhost_scsi_complete_cmd_work() - Change vhost_scsi_*_handle_kick() to use pr_debug - Fix possible race in vhost_scsi_set_endpoint() for vs->vs_tpg checking + assignment. - Convert tv_tpg->tpg_vhost_count + ->tv_tpg_port_count from atomic_t -> int, and make sure reference is protected by ->tv_tpg_mutex. - Drop unnecessary vhost_scsi->vhost_ref_cnt - Add 'err:' label for exception path in vhost_scsi_clear_endpoint() - Add enum for VQ numbers, add usage in vhost_scsi_open() - Add vhost_scsi_flush() + vhost_scsi_flush_vq() following drivers/vhost/net.c - Add smp_wmb() + vhost_scsi_flush() call during vhost_scsi_set_features() - Drop unnecessary copy_from_user() usage with GET_ABI_VERSION ioctl - Add missing vhost_scsi_compat_ioctl() caller for vhost_scsi_fops - Fix function parameter definition first line to follow existing vhost code style - Change 'vHost' usage -> 'vhost' in handful of locations - Change -EPERM -> -EBUSY usage for two failures in tcm_vhost_drop_nexus() - Add comment for tcm_vhost_workqueue in tcm_vhost_init() - Make GET_ABI_VERSION return 'int' + add comment in tcm_vhost.h Reported-by: Michael S. Tsirkin <mst@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-08-16 17:33:40 -07:00
Fengguang Wu	f0e0e9bba5	tcm_vhost: Fix incorrect IS_ERR() usage in vhost_scsi_map_iov_to_sgl Fix up a new coccinelle warnings reported by Fengguang Wu + Intel 0-DAY kernel build testing backend: drivers/vhost/tcm_vhost.c:537:23-29: ERROR: allocation function on line 533 returns NULL not ERR_PTR on failure vim +537 drivers/vhost/tcm_vhost.c 534 if (!sg) 535 return -ENOMEM; 536 pr_debug("%s sg %p sgl_count %u is_err %ld\n", __func__, > 537 sg, sgl_count, IS_ERR(sg)); 538 sg_init_table(sg, sgl_count); 539 540 tv_cmd->tvc_sgl = sg; Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-08-16 17:33:28 -07:00
Nicholas Bellinger	057cbf49a1	tcm_vhost: Initial merge for vhost level target fabric driver This patch adds the initial code for tcm_vhost, a Vhost level TCM fabric driver for virtio SCSI initiators into KVM guest. This code is currently up and running on v3.5-rc2 host+guest from target-pending/for-next-merge. Using tcm_vhost requires Zhi's -> Stefan -> nab's qemu vhost-scsi tree here: http://git.kernel.org/?p=virt/kvm/nab/qemu-kvm.git;a=shortlog;h=refs/heads/vhost-scsi -- Changelog v4 -> v5: Expose ABI version via VHOST_SCSI_GET_ABI_VERSION + use Rev 0 as starting point for v3.6-rc code (Stefan + ALiguori + nab) Convert vhost_scsi_handle_vq() to vq_err() (nab + MST) Minor style fixes from checkpatch (nab) Changelog v3 -> v4: Rename vhost_vring_target -> vhost_scsi_target (mst + nab) Use TRANSPORT_IQN_LEN in vhost_scsi_target->vhost_wwpn[] def (nab) Move back to drivers/vhost/, and just use drivers/vhost/Kconfig.tcm (mst) Move TCM_VHOST related ioctl defines from include/linux/vhost.h -> drivers/vhost/tcm_vhost.h as requested by MST (nab) Move Kbuild.tcm include from drivers/staging -> drivers/vhost/, and just use 'if STAGING' around 'source drivers/vhost/Kbuild.tcm' Changelog v2 -> v3: Unlock on error in tcm_vhost_drop_nexus() (DanC) Fix strlen() doesn't count the terminator (DanC) Call kfree() on an error path (DanC) Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab) Fix another strlen() off by one in tcm_vhost_make_tport (DanC) Add option under drivers/staging/Kconfig, and move to drivers/vhost/tcm/ as requested by MST (nab) Changelog v1 -> v2: Fix tv_cmd completion -> release SGL memory leak (nab) Fix sparse warnings for static variable usage ((Fengguang Wu) Fix sparse warnings for min() typing + printk format specs (Fengguang Wu) Convert to cmwq submission for I/O dispatch (nab + hch) Changelog v0 -> v1: Merge into single source + header file, and move to drivers/vhost/ Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-07-29 13:49:10 -07:00
Stefan Hajnoczi	163049aefd	vhost: make vhost work queue visible The vhost work queue allows processing to be done in vhost worker thread context, which uses the owner process mm. Access to the vring and guest memory is typically only possible from vhost worker context so it is useful to allow work to be queued directly by users. Currently vhost_net only uses the poll wrappers which do not expose the work queue functions. However, for tcm_vhost (vhost_scsi) it will be necessary to queue custom work. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-07-22 01:22:23 +03:00
Stefan Hajnoczi	0dd05a3b60	vhost: Separate vhost-net features from vhost features In order for other vhost devices to use the VHOST_FEATURES bits the vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES constant. (Asias: Update drivers/vhost/test.c to use VHOST_NET_FEATURES) Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Asias He <asias@redhat.com> Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-07-22 01:21:53 +03:00
Jens Freimann	d7ffde35e3	vhost: use USER_DS in vhost_worker thread On some architectures address spaces are set up in a way that this is not necessary to work properly but on some others (like s390) it is. Make sure we operate on the user address space to allow copy_xxx_user() from the vhost_worker() thread by setting it explicitly before calling use_mm() and revert it after unuse_mm(). Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-26 21:10:56 -07:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
Basil Gor	c53cff5e42	vhost-net: fix handle_rx buffer size Take vlan header length into account, when vlan id is stored as vlan_tci. Otherwise tagged packets coming from macvtap will be truncated. Signed-off-by: Basil Gor <basil.gor@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-11 18:16:57 -04:00
Jason Wang	c70aa540c7	vhost: zerocopy: poll vq in zerocopy callback We add used and signal guest in worker thread but did not poll the virtqueue during the zero copy callback. This may lead the missing of adding and signalling during zerocopy. Solve this by polling the virtqueue and let it wakeup the worker during callback. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-05-02 18:22:25 +03:00
Jason Wang	c8fb217af5	vhost_net: zerocopy: adding and signalling immediately when fully copied When a packet were fully copied in zerocopy, we don't wait for the DMA done to mark the done flag, so after the packet were passed to lower device, we need to add used and signal guest immediately. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-05-02 18:22:24 +03:00
Jason Wang	dbf34207c6	vhost_net: re-poll only on EAGAIN or ENOBUFS Currently, we restart tx polling unconditionally when sendmsg() fails. This would cause unnecessary wakeups of vhost wokers and waste cpu utlization when evil userspace(guest driver) is able to hit EFAULT or EINVAL. The polling is only needed when the socket send buffer were exceeded or not enough memory. So fix this by restarting polling only when sendmsg() returns EAGAIN/ENOBUFS. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-05-02 18:22:24 +03:00
Jason Wang	c460f05739	vhost_net: zerocopy: fix possible NULL pointer dereference of vq->bufs When we want to disable vhost_net backend while there's a tx work, a possible NULL pointer defernece may happen we we try to deference the vq->bufs after vhost_net_set_backend() assign a NULL to it. As suggested by Michael, fix this by checking the vq->bufs instead of vhost_sock_zcopy(). Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-05-02 18:22:22 +03:00
Linus Torvalds	7e29629543	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix namespace init and cleanup in phonet to fix some oopses, from Eric W. Biederman. 2) Missing kfree_skb() in AF_KEY, from Julia Lawall. 3) Refcount leak and source address handling fix in l2tp from James Chapman. 4) Memory leak fix in CAIF from Tomasz Gregorek. 5) When routes are cloned from ipv6 addrconf routes, we don't process expirations properly. Fix from Gao Feng. 6) Fix panic on DMA errors in atl1 driver, from Tony Zelenoff. 7) Only enable interrupts in 8139cp driver after we've registered the IRQ handler. From Jason Wang. 8) Fix too many reads of KS_CIDER register in ks8851 during probe, fixing crashes on spurious interrupts. From Matt Renzelmann. 9) Missing include in ath5k driver and missing iounmap on probe failure, from Jonathan Bither. 10) Fix RX packet handling in smsc911x driver, from Will Deacon. 11) Fix ixgbe WoL on fiber by leaving the laser on during shutdown. 12) ks8851 needs MAX_RECV_FRAMES increased otherwise the internal MAC buffers are easily overflown. Fix from Davide Cimingahi. 13) Fix memory leaks in peak_usb CAN driver, from Jesper Juhl. 14) gred packet scheduler can dump in WRED more when doing a netlink dump. Fix from David Ward. 15) Fix MTU in USB smsc75xx driver, from Stephane Fillod. 16) Dummy device needs ->ndo_uninit handler to properly handle ->ndo_init failures. From Hiroaki SHIMODA. 17) Fix TX fragmentation in ath9k driver, from Sujith Manoharan. 18) Missing RTNL lock in ixgbe PM resume, from Benjamin Poirier. 19) Missing iounmap in farsync WAN driver, from Julia Lawall. 20) With LRO/GRO, tcp_grow_window() is easily tricked into not growing the receive window properly, and this hurts performance. Fix from Eric Dumazet. 21) Network namespace init failure can leak net_generic data, fix from Julian Anastasov. 22) Fix skb_over_panic due to mis-accounting in TCP for partially ACK'd SKBs. From Eric Dumazet. 23) New IDs for qmi_wwan driver, from Bjørn Mork. 24) Fix races in ax25_exit(), from Eric W. Biederman. 25) IPV6 TCP doesn't handle TCP_MAXSEG socket option properly, copy over logic from the IPV4 side. From Neal Cardwell. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) tcp: fix TCP_MAXSEG for established IPv6 passive sockets drivers/net: Do not free an IRQ if its request failed drop_monitor: allow more events per second ks8851: Fix request_irq/free_irq mismatch net/hyperv: Adding cancellation to ensure rndis filter is closed ks8851: Fix mutex deadlock in ks8851_net_stop() net ax25: Reorder ax25_exit to remove races. icplus: fix interrupt for IC+ 101A/G and 1001LF net: qmi_wwan: support Sierra Wireless MC77xx devices in QMI mode bnx2x: off by one in bnx2x_ets_e3b0_sp_pri_to_cos_set() ksz884x: don't copy too much in netdev_set_mac_address() tcp: fix retransmit of partially acked frames netns: do not leak net_generic data on failed init net/sock.h: fix sk_peek_off kernel-doc warning tcp: fix tcp_grow_window() for large incoming frames drivers/net/wan/farsync.c: add missing iounmap davinci_mdio: Fix MDIO timeout check ipv6: clean up rt6_clean_expires ipv6: fix rt6_update_expires arcnet: rimi: Fix device name in debug output ...	2012-04-22 21:02:57 -07:00
Michael S. Tsirkin	ca8f4fb21d	skbuff: struct ubuf_info callback type safety The skb struct ubuf_info callback gets passed struct ubuf_info itself, not the arg value as the field name and the function signature seem to imply. Rename the arg field to ctx to match usage, add documentation and change the callback argument type to make usage clear and to have compiler check correctness. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-13 13:09:19 -04:00
Zhi Yong Wu	5e7045b010	tools/virtio: fix up vhost/test module build commit `ea5d404655` broke build for the vhost test module used by tools/virtio. Fix it up. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-04-12 10:35:42 +03:00
David S. Miller	f1e84eb3bb	Merge branch 'vhost-net' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	2012-03-23 14:46:48 -04:00
Cong Wang	c6daa7ffa8	vhost: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:21 +08:00
Michael S. Tsirkin	ea5d404655	vhost: fix release path lockdep checks We shouldn't hold any locks on release path. Pass a flag to vhost_dev_cleanup to use the lockdep info correctly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Sasha Levin <levinsasha928@gmail.com>	2012-02-28 09:13:22 +02:00
Nadav Har'El	d550dda192	vhost: don't forget to schedule() This is a tiny, but important, patch to vhost. Vhost's worker thread only called schedule() when it had no work to do, and it wanted to go to sleep. But if there's always work to do, e.g., the guest is running a network-intensive program like netperf with small message sizes, schedule() was never called. This had several negative implications (on non-preemptive kernels): 1. Passing time was not properly accounted to the "vhost" process (ps and top would wrongly show it using zero CPU time). 2. Sometimes error messages about RCU timeouts would be printed, if the core running the vhost thread didn't schedule() for a very long time. 3. Worst of all, a vhost thread would "hog" the core. If several vhost threads need to share the same core, typically one would get most of the CPU time (and its associated guest most of the performance), while the others hardly get any work done. The trivial solution is to add if (need_resched()) schedule(); After doing every piece of work. This will not do the heavy schedule() all the time, just when the timer interrupt decided a reschedule is warranted (so need_resched returns true). Thanks to Abel Gordon for this patch. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-02-28 09:13:19 +02:00
stephen hemminger	7c7c7f01cc	vhost-net: add module alias (v2.1) By adding some module aliases, programs (or users) won't have to explicitly call modprobe. Vhost-net will always be available if built into the kernel. It does require assigning a permanent minor number for depmod to work. Also: - use C99 style initialization. - add missing entry in documentation for loop-control Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-13 10:12:23 -08:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Shirley Ma	9e380825ab	vhost: handle wrap around in # of bufs math The meth for calculating the # of outstanding buffers gives incorrect results when vq->upend_idx wraps around zero. Fix that. Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-21 10:48:27 +03:00
Michael S. Tsirkin	c047e5f317	vhost-net: update used ring on backend change On backend change, we flushed out outstanding skbs but forgot to update the used ring, so that done entries were left in the ubuf_info ring. As a result we lose heads or complete incorrect ones, crashing the guest or leaking memory. Fix by updating the used ring. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-21 10:23:31 +03:00
Michael S. Tsirkin	b834226b04	vhost: optimize interrupt enable/disable As we now only update used ring after enabling the backend, we can write flags with __put_user: as that's done on data path, it matters. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-19 17:17:28 +03:00
Michael S. Tsirkin	75fd9edc10	vhost: fix zcopy reference counting Fix get/put refcount imbalance with zero copy, which caused qemu to hang forever on guest driver unload. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-19 13:28:34 +03:00
Jason Wang	2723feaa8e	vhost: set log when updating used flags or avail event We need to log writes when updating used flags and avail event fields. Otherwise the guest may see a stale value after migration and miss notifying the host. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-19 13:28:34 +03:00
Jason Wang	f59281dafb	vhost: init used ring after backend was set Move the used ring initialization after backend was set. This makes it possible to disable the backend and tweak the used ring, then restart. This will also make it possible to log the used ring write correctly. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-07-19 13:28:34 +03:00
Michael S. Tsirkin	bab632d69e	vhost: vhost TX zero-copy support >From: Shirley Ma <mashirle@us.ibm.com> This adds experimental zero copy support in vhost-net, disabled by default. To enable, set experimental_zcopytx module option to 1. This patch maintains the outstanding userspace buffers in the sequence it is delivered to vhost. The outstanding userspace buffers will be marked as done once the lower device buffers DMA has finished. This is monitored through last reference of kfree_skb callback. Two buffer indices are used for this purpose. The vhost-net device passes the userspace buffers info to lower device skb through message control. DMA done status check and guest notification are handled by handle_tx: in the worst case is all buffers in the vq are in pending/done status, so we need to notify guest to release DMA done buffers first before we get any new buffers from the vq. One known problem is that if the guest stops submitting buffers, buffers might never get used until some further action, e.g. device reset. This does not seem to affect linux guests. Signed-off-by: Shirley <xma@us.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-18 10:42:32 -07:00
Michael S. Tsirkin	8ea8cf89e1	vhost: support event index Support the new event index feature. When acked, utilize it to reduce the # of interrupts sent to the guest. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-05-30 11:14:15 +09:30
Rob Landley	6151658751	Correct occurrences of - Documentation/kvm/ to Documentation/virtual/kvm - Documentation/uml/ to Documentation/virtual/uml - Documentation/lguest/ to Documentation/virtual/lguest throughout the kernel source tree. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>	2011-05-06 09:27:55 -07:00
Michael S. Tsirkin	de4d768a42	vhost-net: remove unlocked use of receive_queue Use of skb_queue_empty(&sock->sk->sk_receive_queue) without taking the sk_receive_queue.lock is unsafe or useless. Take it out. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-13 23:08:19 +02:00
Jason Wang	783e398854	vhost: lock receive queue, not the socket vhost takes a sock lock to try and prevent the skb from being pulled from the receive queue after skb_peek. However this is not the right lock to use for that, sk_receive_queue.lock is. Fix that up. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-13 23:08:04 +02:00
Jason Wang	94249369e9	vhost-net: Unify the code of mergeable and big buffer handling Codes duplication were found between the handling of mergeable and big buffers, so this patch tries to unify them. This could be easily done by adding a quota to the get_rx_bufs() which is used to limit the number of buffers it returns (for mergeable buffer, the quota is simply UIO_MAXIOV, for big buffers, the quota is just 1), and then the previous handle_rx_mergeable() could be resued also for big buffers. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-13 17:00:10 +02:00
Jason Wang	cfbdab9513	vhost-net: check the support of mergeable buffer outside the receive loop No need to check the support of mergeable buffer inside the recevie loop as the whole handle_rx()_xx is in the read critical region. So this patch move it ahead of the receiving loop. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-13 16:57:30 +02:00
Michael S. Tsirkin	fcc042a280	vhost: copy_from_user -> __copy_from_user copy_from_user is pretty high on perf top profile, replacing it with __copy_from_user helps. It's also safe because we do access_ok checks during setup. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2011-03-08 18:03:05 +02:00

1 2 3

121 Commits