Showing posts with label SaaS. Show all posts
Showing posts with label SaaS. Show all posts

Thursday, 4 December 2008

Are you capacity bound or performance bound?

Calculating Hosted Exchange database disks

I have read a fair few articles about calculating your disk storage requirements for the Exchange Mailbox role. The first time I ever read the question, “Are you performance bound or capacity bound?”, I translated the question rather badly in my head and answered, well I want both capacity and performance. Having worked through the calculations of how many disks are required for capacity and how many disks are required for storage, I realised what the question meant. Satisfying which causes you to use more disks?

Resources

http://technet.microsoft.com/en-us/library/bb738147.aspx
http://technet.microsoft.com/en-us/library/cc671168.aspx
http://en.wikipedia.org/wiki/Gigabyte

Summary
I recently had to calculate how many disks are required to meet both performance and capacity criteria for the Exchange databases on a Hosted Exchange solution. These calculations show which RAID type should be used to achieve both criterion with the least amount of disks. These calculations are irrelevant of the Storage Architecture and are applicable for both SAN and DAS technology and only shows Exchange database calculations (not Transaction Logs). All calculations are of course baseline predictions, with many assumptions and therefore cannot be 100% guaranteed. To protect my companies internal design I have changed all numbers with regards to number of mailboxes, quotas, Send\Receive profile, etc to produce different numbers. The theory is still the same though.

Assumptions

In this environment it has been calculated that there is 15,000 mailboxes per MBX server. All users are classified as “Light Users” and send\receive 25 emails a day. Mailbox servers have been calculated with the maximum 8 cores and 32GB memory.

Performance Calculations

Database Cache
Database cache = (MBX Server memory - 2GB) / Total users per MBX server
Database cache= (32GB – 2GB) / 15,000
Database cache=2.048MB per user

Database Reads per user
Multiply the 25 messages per day by 0.0048, which results in 0.12. Next, take the amount of database cache per mailbox (2.048 MB) to the -0.65th power (2.048 ^ -0.65), which results in 0.6275. Finally, multiply the two figures, which results in database reads per user (0.12 × 0.6275 = 0.0753).

Database Writes per user
Multiply the number of messages per user (25) by 0.00152, which results in 0.038 database writes per user.

Database I/O (Front End)
Total database IOPS per user = ((0.0048 × M) × (D ^ -0.65)) + (0.00152 × M)
Total database IOPS per user= 0.0753 + 0.038 = 0.1133
Total read IOPS per MBX server = 0.0753 x 15,000 = 1129.5
Total write IOPS per MBX server = 0.1133 x 15,000 = 1699.5

Database I/O (Back End)
RAID 10 = Write x 2 + Read
RAID 10 = (1699.5 x 2) + 1129 = 4528 sustained IOPS
RAID5 = Write x 4 + Read
RAID5 = (1699.5 x 4) + 1129 = 7927 sustained IOPS

Disks required
Assuming an average 15,000rpm disk can sustain an average of 180 IOPS and a 10,000rpm disk can sustain an average of 140 IOPS, the following calculation shows the amount of disks required to cope with the Exchange database performance.
RAID 10, 15K Disks = 26 disks
RAID 10, 10K Disks = 33 disks
RAID5, 15K Disks = 45 disks
RAID5, 10K Disks = 57 disks


Capacity Calculations
Database capacity = Mailbox Capacity + Database whitespace + Dumpster
Mailbox Capacity = Total Users x Mailbox quota x OverSubscription ratio (see previous post about Oversubscription)
Quota = 1GB
OverSubscription ratio = 20%
Total Users = 15,000
Mailbox Capacity = (15000 x 1 x 20%) = 3000GB

Database Whitespace = Total amount of users x Average amount of mail sent per day x Average message size
Database Whitespace = 15,000 x25 x50KB = 17.9GB

Database Dumpster = Email retention period (days) x Average amount of mail sent per day x Average message size
Dumpster = 14 x 15,000 x 25 x50KB = 251GB

Database capacity = 3000GB + 17.0GB + 251GB = 3268GB (/200 = minimum 17 Storage Groups required)

Database Capacity disks required
RAID10 Capacity = (Amount of disks x Capacity of disk) \2
RAID5 Capacity = Capacity of disk x (Amount of disks -1)
400GB disk is actually 372GB, 300GB disk is actually 278GB
RAID10, 300GB Disks = 24 Disks
RAID10, 400GB Disks = 18 Disks
RAID5, 300GB Disks = 13 Disks
RAID5, 400GB Disks = 10 Disks


Conclusion
These calculations show that this Hosted Exchange solution is more performance bound than it is capacity bound. It shows that the RAID type should be RAID10 with 15,000rpm disks. To meet performance it is advisable to design the solution with 26, 300GB, 15K disks. Interestingly, the actual calculations I used in my performance showed that the RAID5 would have been the preferable solution.

Friday, 12 September 2008

Oversubscription...or is it contention or Thin Provisioning?

I have not blogged in a wee while as I have been working on financials and cost planning for implementing the hosted Exchange solution I am working on, rather than technical architecture.... a bit of "Excel hell".

Normally I blog with unusual or interesting findings, normally with solutions to problems. This post is a bit different as it is a bit more conceptual.

As part of my design for hosted Exchange I obviously need to design the mailbox storage, and as part of that design the capacity planning. It seems the whole world and it's dog are giving away huge mailboxes by default. Exchange Labs has 10GB, GMail has some sort of increasing figure coming up 10GB, Hotmail has 5GB and Yahoo has unlimited storage!

So the problem for a SaaS provider is, how do you cost for this? You can guarantee that none of the big vendors actually have 10GB of disk space for every one of their millions of users sitting in their data centre, just in case. The fact is that if every user has a 10GB quota on their mailbox a very minute percentage will every get anywhere close to this.

What you need to do is calculate a ratio of how much space is actually required vs the total quota limit. There seem to be a few different names for this. A few of my ISP colleague continually refer to this as the contention ratio. However after many hours Google'ing the science (or lack of) contention ratios I found that this is a bandwidth term, not a storage capacity term. It seems the correct term is an Over Subscription ratio. The other term that kept cropping up was Thin Provisioning, which is the practice of assigning less capacity that the total quota limit, but has some software fooling the hosted application into thinking it has the full available quota. Thin Provisioning @ Wikipedia

The next issue comes from the reason you want to allocate less storage capacity. In an internal deployment it is simple. The cost of the initial deployment is cheaper as you simply add storage as it is required. As a hosting provider, it is a little more complicated. You want to reduce the total storage required in order to reduce the cost of the solution altogether. Therefore you need to take a "bet" on how much storage is going to be needed based on your Over Subscription ratio, cost the cost of the solution per mailbox and therefore allocate a price. The main risk is if the Over Subscription ratio is overestimated, it is difficult to recoup the cost of extra storage costs once the price has been set.

One of the methods of determining an Over Subscription ratio is obviously to obtain statistics from our current dedicated Exchange deployments. The specific information I wanted to extract was the Total Mailbox Size and Last Logon Date (to determine mailboxes never or rarely used). The Exchange 2007 Powershell command I have used is as follows;

Get-MailboxStatistics -Database "Staff Database" | Select-Object Displayname,LastLogonTime, @{expression={$_.TotalItemsize.value.ToMB()};name="Mailbox Size"}| Export-Csv D:\StaffStats.csv

The most frustrating thing with the statistics I have obtained so far is that there is a very wide range. The first Exchange deployment I looked at has an average mailbox size of 30MB, the next had an average mailbox size of 200MB and the next about 1GB. Obviously the statistics should eventually show a Bell Curve.