« January 2013 | Main | March 2013 »

February 2013

February 26, 2013

Evaluating Cloud Computing Uptime SLAs

Last week's Windows Azure Storage outage made me thinking how many of us evaluate the vendor's Service Level Agreement (SLA) before they decide to deploy workloads in the cloud. I bet many think about it only when it is too late. 

Sedrvice
Let's take Windows Azure SLA and see how we as consumers of the cloud services are protected in case of downtime. Before all though I would like to point out that it is in the nature of any service (public or private) to experience outage once in a while - think about power outages that we hear about or live through every winter. It is important to understand that this will happen and as users of cloud services we need to be prepared for it. In this post I will use Windows Azure as example not because their services are better or worse than the other cloud vendors but to illustrate how the SLAs impact us and how they differ from vendor to vendor. 


Each SLA (or at least the ones that bigger cloud vendors offer) contains few main sections:

  • Definitions - defining the terms used in the document
  • Claims - describing how and under what terms one can submit a claim for incidents as well as how much you will be credited
  • Exclusions - describing in what cases the vendor is not liable for the outage
  • The actual SLAs - those can be two types:
    • Guaranteed performance characteristics of the service
    • Uptime for the service

Looking at Windows Azure SLAs web page the first thing you will notice is that there are different SLAs for each service. You don't need to read all of them unless you utilize all of the services the vendor offer. The main point here is that you need to read the SLAs for the services you use. If, for example you use Windows Azure Storage and Windows Azure Compute you will notice that the uptime for those differ by 0.05% (Compute has uptime guarantee of 99.95% while Storage has uptime guarantee of 99.90%). Although this number is negligible at first sight using an SLA calculator you will notice that the expected downtime for Storage is twice as much as the expected downtime for Compute. It is obvious that the closer the uptime is to 100% the better the service is.


The next thing that you need to keep in mind is the timeframe for which the uptime is calculated for. In the case of Windows Azure the uptime is guaranteed on a monthly basis (for both Storage and Compute). In comparison Amazon's EC2 has annual uptime guarantee. Monthly SLA guarantees are preferable because you will avoid the case where the service experiences severe outage in particular month and stays up the rest of the year. Just to illustrate the last point imagine that EC2 experiences outage of 3h in particular month and stays up for the next 11 months. This outage is less than the 99.95% guarantee or 4:22:47.99 hours acceptable downtime per year and you will not be eligible for credit for it. On the other side if the SLA guarantee is on a monthly basis you will be eligible for the maximum credit for it because it severely exceeds the 21 minutes acceptable downtime per month. 


One note about the acceptable downtime. In reality hardware in cloud data-centers fails all the time, which may result in downtime for your particular service but will not impact other services or workloads. Such outages are normally covered by the exclusion clause of the SLA and are your own responsibility. You should follow the standard architectural practices for cloud application and always make your services redundant in order to avoid this. The acceptable downtime metric is calculated for outages that impact vast amount of services or customers. Surprisingly though nowhere in the SLAs is mentioned how many customers need to be impacted in order for the vendor to report the outage. It may happen that a rack of servers in the datacenter goes down and few tens of customers are impacted for some amount of time. If you are one of those do not expect to see official statement from the cloud vendor about the outage. As a rule of thumb if the outage doesn't show up in the news you may have hard time proving that you deserve credit


The last thing to keep in mind when evaluating SLAs from big cloud providers is the Beta and trial services. It is simple - there are no SLAs for services released in Beta functionality. You are free to use them at your own risk but don't expect any guarantees for uptime from the vendor.


When the so called secondary cloud providers are concerned you need to be much more careful. Those providers (and there are a lot of them) build their services on top of the bigger cloud vendors and thus are very dependent on the uptimes from the big guys. Hence they don't publish standard SLAs but negotiate the contracts on customer-by-customer basis. Most of the time this is based on the size of business you create for them and you can rely on good terms if you are big customer. Of course they put a lot of effort in helping you design your application for redundancy and avoid the risk of executing the SLA because of primary vendor outage. In the opposite case where you are a single developer you may end up without any guarantees for uptime from smaller cloud vendors.

February 11, 2013

There is more to PaaS than you think

As described in the last week's post NIST defines three different cloud computing service models - IaaS, PaaS and SaaS. IaaS and SaaS are really easy to grasp but I see people struggling to understand the PaaS model. As a long-time application developer though I find the PaaS model the most compelling one for new applications. Here is why.

Screen Shot 2013-02-10 at 11.31.31 PM

I will look at two examples: one enterprise and one from the consumer world.

Let's start with the enterprise scenario. If you examine any enterprise application portfolio you will find out that almost every development team has implemented it's own code for handling common functionality like authentication, authorization, database access etc. There are also numerous cases when the same team developed the same functionality over and over in each new project. Even the componentization model doesn't help in this situation because either developers are often not aware of the existence of the components or there are too many options they can choose from and they cannot find the right fit for their scenario. The Service Oriented Architecture (SOA) was the holly grail for this problem but many enterprises are still far from achieving this goal.

The next problem that you will see in enterprises is that each application has it's own way of accessing common services and resources like external systems, databases and storage. This results in configuration sprawl and the configuration management overhead.

Last but not least purchasing and provisioning the necessary infrastructure for every application is long and tedious process that significantly impacts time-to-market and adds unnecessary tensions between the IT department and the business groups.

Implementing PaaS in the enterprise can alleviate each one of those problems by implementing common functionality like authentication, authorization, database access, messaging etc., reducing the configuration sprawl by providing central service catalog and dynamic reconfiguration, and by decoupling the underlying infrastructure from the application. In addition PaaS leverages the underlying IaaS functionality to provide load balancing, high availability and auto-scaling on the application level.

All PaaS benefits from the enterprise scenario can be easily applied to a consumer application. Those are even more important with mobile, where the growth of users can become exponential. Delivering fluent, scalable and reliable functionality can be crucial to the success of every mobile application. Getting it fast to market though is one of the most important parts. By leveraging PaaS services, mobile application developers can build new experiences fast and easy, without the need to spend time on reimplementing basic functionality. Common features like location awareness, push notifications and even Instagram-like filters are offered by many public PaaS providers - mobile application developers just need to stitch those together in a new experience and publish it on the app stores. Adding the device-indepent nature of those services makes cross-platform rollouts several times faster than if those need to be implemented from scratch.

More than decade ago the application servers advanced the way new applications are developed by offering common framework and set of reusable components. Platforms-as-a-Service are the next step in the evolution of application development by adding inherent cloud computing characteristics like elasticity, on-demand self service and measuring.