Exchange Jetstress on ONTAP Best Practices

There doesn’t appear to be much information out there on “Exchange Jetstress on ONTAP Best Practices.” It seems that a lot of the old information only applies to hard disk drives and not SSDs. Still, I thought it might be useful to record some of my learnings here.

NetApp TRs and NVAs

NVA-1117-DEPLOY: FlexPod Datacenter with Microsoft Exchange 2016, SharePoint 2016, and NetApp AFF A300

Nothing on JetStress in this document (except a comment that it was run), but it does talk about:

6.1 Exchange 2016 Verification
Exchange 2016 was verified by using Microsoft Exchange Load Generator 2013 (LoadGen) because it tested all aspects of Exchange. Jet Stress was run during the installation to prevalidate the storage configuration.
Microsoft Exchange LoadGen 2013 Verification...

Worth checking the NVA out.

TR-4268: 200,000 Exchange Server 2013 Mailboxes on NetApp FAS8060: An Overview of Performance and Scalability

A performance related TR. Remember that it is keyed to FAS8060, so not SSD. There are 15 mentions of JetStress in the document, making it probably the best official document on the titular subject (albeit it is a bit old.)

TR-4221: Microsoft Exchange Server 2016/2013 and SnapManager for Exchange: Best Practices Guide for Clustered Data ONTAP

Again, a bit of an old TR, but it is a Best Practices Guide. There is one mention of JetStress. The more current Exchange Best Practices TR (below) has 0 mentions of JetStress.

TR-4681: Best Practices Guide for Microsoft Exchange Server Using NetApp SnapCenter

Old JetStress Process Based Upon HDD (does not apply to SSD)

The JetStress process followed (in olden - HDD - times) was:

- Create volumes and LUNs (no data)
- Take a snapshot (snap1)
- Create databases
- Take a snapshot (snap2)
- Populate databases with JetStress data
- Take a snapshot (snap3)
- Tests
 ~ Run JetStress
 ~ Get results
 ~ Snap restore volumes to populated database snapshot (snap3)
 ~ Reboot Exchange servers
 ~ Repeat
- Once JetStress was complete volume restore to snap2 and reboot

“This process was based upon HDD and the fact that JetStress aged the filesystem very quickly. Snapshot and rollback will guarantee consistent results (without them you’ll enter a world of pain and uncertainty.)”

Really Old (7-Mode) JetStress Recommendations that Apply to HDD but Not SSD

1) If using FC, ensure you aren’t starving LUNs due to HBA queue settings.

2) Configure WAFL optimization before Exchange Aggregate creation:
a. option wafl.optimize_write_once off (remember, this is for HDD only)
b. Set read_realloc on for each database volume.

3) If you forget step 2:
a. Reallocate –A –o (one-time only aggregate reallocate after turning off optimize_write_once)

4) Configure Flash Cache
a. flexscale.enable on
b. flexscale.normal_data_blocks on

5) Configure Flash Cache for SME Verification
a. flexscale.lopri_blocks on
If you are not looking to increase sequential read verification throughput after a SME backup, this is a waste of cache space.  With Exchange Server 2010 in a DAG configuration Microsoft removes the verification requirement. This setting is typically only used in Exchange 2003/2007 environments.

6) Jetstress artificially ages the file system giving WAFL no time for maintenance.  2-3 days of sustained Jetstress causes years of aging (when compared to a production Exchange Server).  This means that you are compressing the amount of change activity to the volume into an unrealistically short amount of time, without giving WAFL the same amount of time to tune the file system.  This will result is degrading performance results on subsequent Jetstress tests.  To achieve acceptable performance with such a huge change rate, WAFL optimizations would have to be made that are recommended for Exchange, since the production workload does not have massive change rates (see “Old JetStress Process...” above.)
a. Take a snapshot of the volume before beginning the Jetstress test.
b. After running Jetstress 4-5 times you should revert to an early snapshot before running additional tests.
c. Post restore, ensure controller CPU is not at 100% and a ‘wafl scan status’ doesn’t show any block reclamation going on. If it does wait until it completes.

Other Links

There are loads and loads of Microsoft Exchange Server links and guides out there, such as: the below which goes into Jetstress 2013.

Exchange 2016 and 2013: Planning and Design Guide

Image: Microsoft Exchange Jetstress 2013 - Define Test Scenario

Comments