NetApp Cloud Insights Best Practice: Part 1 of 2: Getting Started

** As always, anything I write on this blog is totally unofficial **

In my endeavor to come up with some Best Practices / recommendations / suggestions, I decided to go right through the official documentation, picking out bits that read like best practice. And adding some of my own experience for what I think would be useful. The format of this post will pretty much follow the form (headings) of the official documentation.

What's New with Cloud Insights >> Check Monthly <<

"NetApp is continually improving and enhancing its products and services."

Every month, review the "What's New".

Examples:
  • February 2024 release: Collect ONTAP Switch data.
  • January 2024 release: Data Domain collector improvements.
  • December 2023 release: Brocade FOS REST data collector.
  • November 2023 release:
    • Storage Node Support Information.
    • Map VMware tags to Cloud Insights Annotations.
  • October 2023 release: Restrict access to specified domains.
  • July 2023 release: API for audit.
  • June 2023 release: Check out your usage (breakdown of MU usage based on Feature Set.)
  • May 2023 release: Default ONTAP System Monitors Enabled for New Customers
    • Because business needs vary from company to company, we always recommend taking a look at the system monitors in your environment and pausing or resuming each based on your alerting needs.


Cloud Insights follows the best practices of the Shared Responsibility Model described by AWS.


Cloud Insights performs independent third-party Audit and validations from external Licensed CPA firm of its security, processes, and services, including completion of the SOC 2 Audit.


  • Cloud Insights stores the following information:
  • Performance data 
  • Inventory data
  • Configuration data
  • Secrets
    • Secrets consist of the credentials used by the Cloud Insights Acquisition Unit to access customer devices and services. These credentials are encrypted using strong asymmetric encryption, and the private keys are stored only on the Acquisition Units and never leave the customer environment. Even privileged Cloud Insights SREs are unable to access customer secrets in plain-text due to this design.
  • Functional Data
  • User Access data
  • Storage Workload Security User Directory Data

To protect sensitive data, NetApp recommends you change the default keys and the Acquisition user password after an installation or upgrade.

It is recommended to use the SecurityAdmin tool in interactive mode, to avoid passing secrets on the command line, which can be captured in logs.

The Security Admin Tool in interactive mode, displays the following actions:
  1. Backup
  2. Restore
  3. Register / Update External Key Retrieval Script
  4. Rotate Encryption Keys
  5. Reset to Default Keys
  6. Change Truststore Password
  7. Change Keystore Password
  8. Encrypt Collector Password
It is recommended that vault backups be kept secure, as they include sensitive information.


Verify AU OS support. For example, for RHEL, currently (July 2024):
- Red Hat Enterprise Linux (64-bit): 7.2 through 7.9, 8.1 through 8.10, 9.1 through 9.4

This computer should be running no other application-level software. A dedicated server is recommended.
  • Specs: 2 CPU, 8 GB RAM, and 100 Mbps/1 Gbps networking.
  • Disk Space (Linux): 100 GB recommended
    • /opt/netapp 20 GB for large environments
    • /var/log/netapp 80 GB for large environments
    • /tmp at least 1 GB available during installation
For accurate audit and data reporting, it is strongly recommended to synchronize the time on the Acquisition Unit machine using Network Time Protocol (NTP) or Simple Network Time Protocol (SNTP).
  • Do you expect to:
    • Discover more than 2500 virtual machines or 10 large (> 2 node) ONTAP clusters, Symmetrix, or HDS/HPE VSP/XP arrays on this Acquisition Unit?
      • YES: Recommend 8 GB more memory (16 GB)
    • Deploy 75 or more total data collectors on this Acquisition Unit?
      • YES: Recommend 8 GB more memory (24 GB) and 50 GB more disk space (added to log location)

Verify that the server or VM hosting the Acquisition Unit meets the recommended system requirements.


"Cloud Insights uses Telegraf as its agent for collection of integration data. Telegraf is a plugin-driven server agent that can be used to collect and report metrics, events, and logs. Input plugins are used to collect the desired information into the agent by accessing the system/OS directly, by calling third-party APIs, or by listening to configured streams (i.e. Kafka, statsD, etc). Output plugins are used to send the collected metrics, events, and logs from the agent to Cloud Insights."

The current Telegraf version for Cloud Insights is 1.24.0.


Review Advanced Configuration.
Validate Test Configuration.


"Because data collectors are the primary source of information for Cloud Insights, it is imperative that you ensure that they remain in a running state."


Cloud Insights provides a number of Recommended Dashboards to provide business insights into your data. Each dashboard contains widgets designed to help answer a particular question or solve a particular problem relevant to the data currently being collected in your environment.

  • Permission Levels:
    • Account Owner
    • Administrator
    • User
    • Guest
Best practice is to limit the number of users with Administrator permissions. The greatest number of accounts should be user or guest accounts.

It is strongly recommended to have at least two Account Owners for each Cloud Insights environment. 

You can have as many account owners as you wish, but best practice is to limit the owner role to only select people.
  • Other suggestions:
    • Enable Single Sign-On (Identity Federation)
    • Enable Single Sign-On (SSO) User Auto-Provisioning
    • Restrict Access by Domain

Suggestion: Review the list to see if there are any Data Collectors that you could use and are not using. As of writing there are over 60 infrastructure type and over 20 service type.


Q: What Happens if I Exceed My Subscribed Usage?
A: When usage exceeds 100%:

An error banner is displayed and you will have limited functionality until you do one of the following:
- Remove Data Collectors so that your Managed Unit usage is at or below your subscribed amount
- Modify your subscription to increase the subscribed Managed Unit count

Comments