Friday 12 September 2008

Oversubscription...or is it contention or Thin Provisioning?

I have not blogged in a wee while as I have been working on financials and cost planning for implementing the hosted Exchange solution I am working on, rather than technical architecture.... a bit of "Excel hell".

Normally I blog with unusual or interesting findings, normally with solutions to problems. This post is a bit different as it is a bit more conceptual.

As part of my design for hosted Exchange I obviously need to design the mailbox storage, and as part of that design the capacity planning. It seems the whole world and it's dog are giving away huge mailboxes by default. Exchange Labs has 10GB, GMail has some sort of increasing figure coming up 10GB, Hotmail has 5GB and Yahoo has unlimited storage!

So the problem for a SaaS provider is, how do you cost for this? You can guarantee that none of the big vendors actually have 10GB of disk space for every one of their millions of users sitting in their data centre, just in case. The fact is that if every user has a 10GB quota on their mailbox a very minute percentage will every get anywhere close to this.

What you need to do is calculate a ratio of how much space is actually required vs the total quota limit. There seem to be a few different names for this. A few of my ISP colleague continually refer to this as the contention ratio. However after many hours Google'ing the science (or lack of) contention ratios I found that this is a bandwidth term, not a storage capacity term. It seems the correct term is an Over Subscription ratio. The other term that kept cropping up was Thin Provisioning, which is the practice of assigning less capacity that the total quota limit, but has some software fooling the hosted application into thinking it has the full available quota. Thin Provisioning @ Wikipedia

The next issue comes from the reason you want to allocate less storage capacity. In an internal deployment it is simple. The cost of the initial deployment is cheaper as you simply add storage as it is required. As a hosting provider, it is a little more complicated. You want to reduce the total storage required in order to reduce the cost of the solution altogether. Therefore you need to take a "bet" on how much storage is going to be needed based on your Over Subscription ratio, cost the cost of the solution per mailbox and therefore allocate a price. The main risk is if the Over Subscription ratio is overestimated, it is difficult to recoup the cost of extra storage costs once the price has been set.

One of the methods of determining an Over Subscription ratio is obviously to obtain statistics from our current dedicated Exchange deployments. The specific information I wanted to extract was the Total Mailbox Size and Last Logon Date (to determine mailboxes never or rarely used). The Exchange 2007 Powershell command I have used is as follows;

Get-MailboxStatistics -Database "Staff Database" | Select-Object Displayname,LastLogonTime, @{expression={$_.TotalItemsize.value.ToMB()};name="Mailbox Size"}| Export-Csv D:\StaffStats.csv

The most frustrating thing with the statistics I have obtained so far is that there is a very wide range. The first Exchange deployment I looked at has an average mailbox size of 30MB, the next had an average mailbox size of 200MB and the next about 1GB. Obviously the statistics should eventually show a Bell Curve.