Over the years (since the noughties) I have done much VMware. I was a VMware guru before I became a NetApp guru. I was VCP 3,4,5,6,7. Recently I touch VMware products less and less. I need to refresh my knowledge of NetApp + ONTAP + VMware best practices, herein this blog post!
This post comes from the angle of integrating NetApp ONTAP with VMware vSphere. It is more SAN focused (for my use case), so I've skimmed over NFS best practices (links are below for further reading).
Top Links
- VMware vSphere with ONTAP
- This web documentation replaces the previously published Technical Report: TR-4597: VMware vSphere for ONTAP
- Broadcom | VMware | Hardware Compatibility Guide
- for VMware Cloud Foundation (VCF/VVF) and VMware vSphere
- Interoperability Matrix Tool
- NetApp Interoperability Matrics
- ONTAP tools for VMware vSphere 10 (Documentation)
- is a set of tools for using ONTAP storage together with vSphere
- NetApp NFS Plug-in for VMware vStorage APIs for Array Integration (VAAI)
- is a software library that integrates the VMware Virtual Disk Libraries that are installed on the ESXi host.
- Active IQ Config Advisor
- you should run Active IQ Config Advisor to validate your configuration and check for common configuration errors
- Cisco - FlexPod Design Guides
- FlexPod is a converged infrastructure solution developed jointly by NetApp and Cisco. There are many VMware specific FlexPod Design Guides that are very much worth referencing.
- NetApp - FlexPod Validated Designs
- Search for the VMware validated designs.
- NetApp | Lab On Demand
- You can find some great VMware with NetApp labs here if you have access. Such as:
- Easier Data Management with ONTAP Tools for VMware vSphere
- Protect Virtualized Workloads with SnapCenter Plug-in for VMware vSphere
- IT Optimization and VMware Consolidation with Data Infrastructure Insights
- Protecting VMware vSphere VMs with NetApp and Veeam
- Improving Performance with NVMe/TCP (not VMware specific, but useful if you want to understand configuration steps on the ONTAP side)
Review of VMware vSphere with ONTAP
My brief notes ("as brief as they can be but no briefer")
- From: vSphere datastore and protocol features overview
- Six protocols are supported to connect VMware vsphere to datastores on an ONTAP system:
- FCP, NVMe/FC, NVMe/TCP, NFS v3, NFS v4.1 (see table for supported features)
- NetApp recommends using in-guest iSCSI for Microsoft clusters rather than multiwriter-enabled VMDKs in a VMFS datastore.
- Datastores using NVMe-oF and NFS v4.1 require vSphere replication. Array-based replication for NFS v4.1 is not currently supported by SRM.
- NVMe-oF (NVMe/TCP and NVMe/FC) shows remarkable gains in IOPS, reduction in latency, and up to 50% or more reduction in host CPU consumption by storage IO.
- In general, NetApp recommends using the ONTAP tools for VMware vSphere interface within vCenter to provision traditional and vVols datastores to make sure best practices are followed.
- General Networking:
- Separate storage network traffic from other networks.
- Jumbo frames can be used if desired and supported by your network.
- NetApp only recommends disabling network flow control on the cluster interconnect ports within an ONTAP cluster.
- Ethernet storage networks:
- NetApp recommends configuring the Ethernet ports to which these systems connect as Rapid Spanning Tree Protocol (RSTP) edge ports / Cisco PortFast.
- NetApp recommends enabling the Spanning-Tree PortFast trunk feature in environments that use the Cisco PortFast feature and that have 802.1Q VLAN trunking enabled to either the ESXi server or the ONTAP storage arrays.
- NetApp recommends the following best practices for link aggregation:
- Use switches that support link aggregation of ports on two separate switch chassis ...
- Use LACP to create link aggregates for ONTAP storage systems with port or IP hash.
- Use an IP hash teaming policy on ESXi when using static link aggregation (e.g., EtherChannel) and standard vSwitches, or LACP-based link aggregation with vSphere Distributed Switches. If link aggregation is not used, then use "Route based on the originating virtual port ID" instead.
- From: SAN (FC, FCoE, NVMe/FC, iSCSI), RDM
- It is a NetApp best practice to have at least two LIFs per node per SVM and to use SLM to limit the paths advertised to the node hosting the LUN and its HA partner.
- ONTAP SAN best practice is to use two physical ports and LIFs per node, one for each fabric.
- For iSCSI networks, use multiple VMkernel network interfaces on different network subnets with NIC teaming when multiple virtual switches are present. You can also use multiple physical NICs connected to multiple physical switches to provide HA and increased throughput. See below:
- From: NFS
- Use a single logical interface (LIF) for each SVM on each node in the ONTAP cluster.
- All versions of VMware vSphere that are currently supported can use both NFS v3 and v4.1. Official support for nconnect was added to vSphere 8.0 update 2 for NFS v3, and update 3 for NFS v4.1.
- + More ...
- From: FlexGroup volumes
- Use FlexGroup volumes with vSphere if you require a single, scalable vSphere datastore with the power of a full ONTAP cluster, or if you have very large cloning workloads that can benefit from the FlexGroup cloning mechanism by constantly keeping the clone cache warm.
- + More ...
- From: Network configuration
- Similar to the above "General Networking". Also contains a useful table.
- (skipped some headings here)
- From: Active IQ Unified Manager
- Active IQ Unified Manager provides visibility into the VMs in your virtual infrastructure and enables monitoring and troubleshooting storage and performance issues in your virtual environment.
- (skipped some headings here)
- From: Recommended ESXi host and other ONTAP settings
- NetApp has developed a set of optimal ESXi host settings for both NFS and block protocols. These values are easily set using ONTAP tools for VMware vSphere.
- ** See the table of settings - applies to ESXi hosts + volumes and LUNs created. **
- Multipath settings for performance are not configured by ONTAP tools. NetApp suggests:
- When using non-ASA systems in high-performance environments or when testing performance with a single LUN datastore, consider changing the load balance setting of the round-robin (VMW_PSP_RR) path selection policy (PSP) from the default IOPS setting of 1000 to a value of 1: Adjusting Round Robin IOPS limit from default 1000 to 1
- NetApp recommends using the latency option in environments with non-equivalent path connectivity, such as cases with more network hops on one path than another, or when using a NetApp ASA system. See: Viewing and Managing Storage Paths on ESXi Hosts
- Further reading:
Comments
Post a Comment