Windows Azure

March 20, 2013

Migrating Legacy Applications to the Cloud

ToolkitWith everybody jumping on the cloud computing bandwagon lately, developers and architects need to spend extra time analyzing applications that can become good candidates for migration. It is wrong to believe that every legacy application can be easily migrated from the traditional on-premise infrastructure to any cloud computing environment. Therefore such migration efforts should be approached carefully and systematically.

Let's look at couple of issues that you may face when trying to migrate legacy applications to the cloud.

Client-Server Applications

Client-server applications are characterized with tight coupling between the business logic and the data tier. Most of the times the business logic is implemented as stored procedures in the database and pulling it ut can be a substantial effort. In addition such application establish a sticky session between the client and the server, which violates common cloud architecture patterns and complicates the migration process.

The obvious approach for migrating client-server applications to the cloud is to gradually abstract the business logic in a service layer and deploy the latter to the cloud. The cleaned up data tier can still be hosted on the current infrastructure until time comes to either migrate the data or retire it. At a high level your should follow these steps:

  • Identify the business services that are exposed to the clients
  • Implement those services as a separate business layer
  • Deploy the new business layer on a cloud enabled infrastructure (either IaaS or PaaS)
  • Implement a thin client layer on top of the services (in certain cases you may be able to modify the existing clients to connect to the services instead the data tier)
  • Roll-out the new client among your users
  • Retire the business logic in the data tier

This approach provides smooth migration because it postpones the data migration, a highly critical business component, to a later stage and in the mean time the organization is gaining important knowledge and discovers potential issues with the cloud technologies.

Scheduled Tasks

Scheduled tasks or batch jobs are another legacy application pattern that can introduce some challenges when migrating to the cloud. The premise of such applications is that they are triggered either at certain intervals or by a new batch of data that gets delivered. Majority of the times the latter approach involves transfers of files between machines. Two things that are at the core of such applications contradict with the modern cloud architectural patterns:

  • The reliance on always-up machines that will trigger the execution at certain intervals
  • The reliance on always-available file system used for file exchange

Functionality that such applications provide is easily achieved through the queue-centric workflow application pattern as described by Bill Wilder in his book Cloud Architecture Patterns. However, redesigning those legacy applications to use message queues can be substantial implementation effort. Hence you should approach the migration in phases. For jobs that rely on file transfers you can use these steps:

  • Change the jobs to use cloud storage instead local file systems
  • Add functionality at the delivery side to drop a message in the queue in addition to dropping the file
  • Remove the polling functionality in the processing job and instead use the message in the queue as a triggering mechanism

For the scheduled tasks you need to change the implementation to use messages in the queue instead time intervals to trigger the tasks.

You can achieve additional benefits if you add Map-Reduce as part of your modern application design. 

Scale Up Applications

Last but not least is the type of applications that rely on additional local resources in order to handle increased loads. Such resources can be CPU speed, memory or disk storage. Unfortunately such applications are hard to migrate to the cloud unless they get redesigned to use horizontal instead vertical scaling. Most of the times such challenges are imposed at the data tier of the applications and can be solved through data-sharding.

The process for migration involves:

  • Analyzing the data and potential de-normalization
  • Identifying the shard key
  • Splitting the data amongst the shards

As bottom line the gains for the organization in the above mentioned migration approaches are:

  • Improved (and more cloud-ready) application architecture
  • Enabled economies of scale at the different tiers of the application

However the biggest benefits is the cloud computing knowledge that the organization gains throughout the process.

February 26, 2013

Evaluating Cloud Computing Uptime SLAs

Last week's Windows Azure Storage outage made me thinking how many of us evaluate the vendor's Service Level Agreement (SLA) before they decide to deploy workloads in the cloud. I bet many think about it only when it is too late. 

Sedrvice
Let's take Windows Azure SLA and see how we as consumers of the cloud services are protected in case of downtime. Before all though I would like to point out that it is in the nature of any service (public or private) to experience outage once in a while - think about power outages that we hear about or live through every winter. It is important to understand that this will happen and as users of cloud services we need to be prepared for it. In this post I will use Windows Azure as example not because their services are better or worse than the other cloud vendors but to illustrate how the SLAs impact us and how they differ from vendor to vendor. 


Each SLA (or at least the ones that bigger cloud vendors offer) contains few main sections:

  • Definitions - defining the terms used in the document
  • Claims - describing how and under what terms one can submit a claim for incidents as well as how much you will be credited
  • Exclusions - describing in what cases the vendor is not liable for the outage
  • The actual SLAs - those can be two types:
    • Guaranteed performance characteristics of the service
    • Uptime for the service

Looking at Windows Azure SLAs web page the first thing you will notice is that there are different SLAs for each service. You don't need to read all of them unless you utilize all of the services the vendor offer. The main point here is that you need to read the SLAs for the services you use. If, for example you use Windows Azure Storage and Windows Azure Compute you will notice that the uptime for those differ by 0.05% (Compute has uptime guarantee of 99.95% while Storage has uptime guarantee of 99.90%). Although this number is negligible at first sight using an SLA calculator you will notice that the expected downtime for Storage is twice as much as the expected downtime for Compute. It is obvious that the closer the uptime is to 100% the better the service is.


The next thing that you need to keep in mind is the timeframe for which the uptime is calculated for. In the case of Windows Azure the uptime is guaranteed on a monthly basis (for both Storage and Compute). In comparison Amazon's EC2 has annual uptime guarantee. Monthly SLA guarantees are preferable because you will avoid the case where the service experiences severe outage in particular month and stays up the rest of the year. Just to illustrate the last point imagine that EC2 experiences outage of 3h in particular month and stays up for the next 11 months. This outage is less than the 99.95% guarantee or 4:22:47.99 hours acceptable downtime per year and you will not be eligible for credit for it. On the other side if the SLA guarantee is on a monthly basis you will be eligible for the maximum credit for it because it severely exceeds the 21 minutes acceptable downtime per month. 


One note about the acceptable downtime. In reality hardware in cloud data-centers fails all the time, which may result in downtime for your particular service but will not impact other services or workloads. Such outages are normally covered by the exclusion clause of the SLA and are your own responsibility. You should follow the standard architectural practices for cloud application and always make your services redundant in order to avoid this. The acceptable downtime metric is calculated for outages that impact vast amount of services or customers. Surprisingly though nowhere in the SLAs is mentioned how many customers need to be impacted in order for the vendor to report the outage. It may happen that a rack of servers in the datacenter goes down and few tens of customers are impacted for some amount of time. If you are one of those do not expect to see official statement from the cloud vendor about the outage. As a rule of thumb if the outage doesn't show up in the news you may have hard time proving that you deserve credit


The last thing to keep in mind when evaluating SLAs from big cloud providers is the Beta and trial services. It is simple - there are no SLAs for services released in Beta functionality. You are free to use them at your own risk but don't expect any guarantees for uptime from the vendor.


When the so called secondary cloud providers are concerned you need to be much more careful. Those providers (and there are a lot of them) build their services on top of the bigger cloud vendors and thus are very dependent on the uptimes from the big guys. Hence they don't publish standard SLAs but negotiate the contracts on customer-by-customer basis. Most of the time this is based on the size of business you create for them and you can rely on good terms if you are big customer. Of course they put a lot of effort in helping you design your application for redundancy and avoid the risk of executing the SLA because of primary vendor outage. In the opposite case where you are a single developer you may end up without any guarantees for uptime from smaller cloud vendors.

January 21, 2013

Essential Cloud Computing Characteristics

If you ask five different experts you will get maybe five different opinions what cloud computing is. And all five may be correct. The best definition of cloud computing that I have ever found is the National Institute of Standards and Technology Definition of Cloud Computing. According to NIST the cloud model is composed of five essential characteristics, three service models, and four deployment models. In this post I will look at the essential characteristics only, and compare to the traditional computing models; in future posts I will look at the service and deployment models. 

Because computing always implies resources (CPU, memory, storage, networking etc.), the premise of cloud is an improved way to provision, access and manage those resources. Let's look at each essential characteristic of the cloud:

On-Demand Self-Service

Essentially what this means is that you (as a consumer of the resources) can provision the resources at any time you want to, and you can do this without assistance from the resource provider

Here is an example. In the old days if your application needed additional computing power to support growing load, the process you normally used to go through is briefly as follows: call the hardware vendor and order new machines; once the hardware is received you need to install the Operating System, connect the machine to the network, configure  any firewall rules etc.; next, you need to install your application and add the machine to the pool of other machines that already handle the load for your application. This is a very simplistic view of the process but it still requires you to interact with many internal and external teams in order to complete it - those can be but are not limited to hardware vendors, IT administrators, network administrators, database administrators, operations etc. As a result it can take weeks or even months to get the hardware ready to use.

Thanks to the cloud computing though you can reduce this process to minutes. All this lengthy process comes to a click of a button or a call to the provider's API and you can have the additional resources available within minutes without. Why is this important?

Because in the past the process involved many steps and usually took months, application owners often used to over provision the environments that host their application. Of course this results in huge capital expenditures at the beginning of the project, resource underutilization throughout the project, and huge losses if the project doesn't succeed. With cloud computing though you are in control and you can provision only enough resources to support your current load.

Broad Network Access

Well, this is not something new - we've had the Internet for more than 20 years already and the cloud did not invent this. And although NIST talks that the cloud promotes the use of heterogenous clients (like smartphones, tablets etc.) I do think this would be possible even without the cloud. However there is one important thing that in my opinion  the cloud enabled that would be very hard to do with the traditional model. The cloud made it easier to bring your application closer to your users around the world. "What is the difference?", you will ask. "Isn't it that the same as Internet or the Web?" Yes and no. Thanks to the Internet you were able to make your application available to users around the world but there were significant differences in the user experience in different parts of the world. Let's say that your company is based on California and you had a very popular application with millions of users in US. Because you are based in California all servers that host your application are either in your basement or in a datacenter that is nearby so that you can easily go and fix any hardware issues that may occur. Now, think about the experience that your users will get across the country! People from East Coast will see slower response times and possibly more errors than people from the West. If you wanted to expand globally then this problems will be amplified. The way to solve this issue was to deploy servers on the East Cost and in any other part of the world that you want to expand to.

With cloud computing though you can just provision new resources in the region you want to expand to, deploy your application and start serving your users.

It again comes to the cost that you incur by deploying new data centers around the world versus just using resources on demand and releasing them if you are not successful. Because the cloud is broadly accessible you can rely on having the ability to provision resources in different parts of the world.

Resource Pooling

One can argue whether resource pooling is good or bad. The part that brings most concerns among users is the colocation of application on the same hardware or on the same virtual machine. Very often you can hear that this compromises security, can impact your application's performance and even bring it down. Those have been real concerns in the past but with the advancement in virtualization technology and the latest application runtimes you can consider them outdated. That doesn't mean that you should not think about security and performance when you design your application.

The good side of the resource pooling is that it enabled cloud providers to achieve higher application density on single hardware and much higher resource utilization (sometimes going up to 75% to 80% compared to the 10%-12% in the traditional approach). As a result of that the price for resource usage continues to fall. Another benefit of the resource pooling is that resources can easily be shifted where the demand is without the need for the customer to know where those resources come from and where are they located. Once again, as a customer you can request from the pool as many resources as you need at certain time; once you are done utilizing those you can return them to the pool so that somebody else can use them. Because you as a customer are not aware what the size of the resource pool is, your perception is that the resources are unlimited. In contrast in the traditional approach the application owners have always been constrained by the resources available on limited number of machines (i.e. the ones that they have ordered and installed in their own datacenter).

Rapid Elasticity

Elasticity is tightly related to the pooling of resources and allows you to easily expand and contract the amount of resources your application is using. The best part here is that this expansion and contraction can be automated and thus save you money when your application is under light load and doesn't need many resources.

In order to achieve this elasticity in the traditional case the process would look something like this: when the load on your application increases you need to power up more machines and add them to the pool of servers that run your application; when the load on your application decreases you start removing servers from the pool and then powering them off. Of course we all know that nobody is doing this because it is much more expensive to constantly add and remove machines from the pool and thus everybody runs the maximum number of machines all the time with very low utilization. And we all know that if the resource planning is not done right and the load on the application is so heavy that the maximum number of machines cannot handle it, the result is increase of errors, dropped request and unhappy customers.

In the cloud scenario where you can add and remove resource within minutes you don't need to spend a great deal of time doing capacity planning. You can start very small, monitor the usage of your application and add more and more resources as you grow. 

Measured Service

In order to make money the cloud providers need the ability to measure the resource usage. Because in most cases the cloud monetization is based on the pay-per-use model they need to be able to give the customers break down of how much and what resources they have used. As mentioned in the NIST definition this allows transparency for both the provider and the consumer of the service. 

The ability to measure the resource usage is important in to you, the consumer of the service, in several different ways. First, based on historical data you can budget for future growth of your application. It also allows you to better budget new projects that deliver similar applications. It is also important for application architects and developers to optimize their applications for lower resource utilization (at the end everything comes to dollars on the monthly bill).

On the other side it helps the cloud providers to better optimize their datacenter resources and provide higher density per hardware. It also helps them with the capacity planning so that they don't end up with 100% utilization and no excess capacity to cover unexpected consumer growth.

Compare this to the traditional approach where you never knew how much of your compute capacity is utilized, or how much of your network capacity is used, or how much of your storage is occupied. In rare cases companies were able to collect such statistics but almost never those have been used to provide financial benefit for the enterprise.

Having those five essential characteristics you should be able to recognize the "true" cloud offerings available on the market. In the next posts I will go over the service and deployment models for cloud computing.

September 16, 2012

What is the Difference Between Apprenda and Windows Azure?

Since I started at Apprenda one of the most common questions I hear is: "What is the difference between Apprenda and Windows Azure?". Let me take a quick stab of what both platforms offer and how you can use their features in a complementary way.

First let's look from a platform-as-a-service (or PaaS) definition point of view. As you may already know both Apprenda and Windows Azure offer PaaS funtionality but because PaaS is a really broad term and is used widely in the industry, we need to make sure we use the same criteria when we compare two offerings. Hence we try to stick to Gartner's Reference Model for PaaS that allows us to make apples-to-apples comparison between the services. If you look at the definition, PaaS is a "category of cloud services that deliver functionality provided by platform, communicaiton and integration middleware". Gartner also lists typical services offered by the PaaS so let's see how Apprenda and Windows Azure compare at those:

  • Application Servers
    Both Apprenda and Windows Azure leverage the functionality of Microsoft's .NET Framework and IIS server.
    In the case of Apprenda IIS server is used to host the front-end tier of your applications while Apprenda's prorietary WCF container is used to host any services.
    In comparison when you develop applications for Windows Azure you use the Web Role to host your application's front-end and a Worker Role to host your services. If you use Windows Azure web sites then all your front-end and business logic is hosted in IIS in a multi-tenant manner.
  • Integration Middleware
    While Windows Azure offers Service Bus in the cloud at this point of time Apprenda does not have its own ServiceBus implementation. However Apprenda applications can easily integrate with any existing Service Bus implementation on premise or in the cloud.
  • Portals and other user experience enablers
    Both Apprenda and Windows Azure offer rich user experience.
    Apprenda has the System Operations Center portal that is targeted to the platform owners, the Developer Portal that is the main application management tool for development teams, and the User Portal where end-users (or tenants) can subscribe to applications provided by development teams. Apprenda also have rich documentation available at http://docs.apprenda.com/ as well as active support community at  http://support.apprenda.com. In addition when applications are deployed in a multi-tenant mode on Apprenda you are allowed to completely customize the login page, which allows for white-labeling support.
    Windows Azure on the other side offers the Management Portal available at http://windows.azure.com. Windows Azure management portal is targeted to the developers who use the platform to host their applications. Unlike Apprenda though and because Windows Azure is a public offering (I will come back to this later on) the management of the platform is done by Microsoft and platform management functionality is not exposed to the public. Windows Azure also offers Marketplace available at  http://datamarket.azure.com/ where developers can publish their and end-users can subscribe for applications and services. Extensive documentation for Windows Azure is available on their main web site at https://www.windowsazure.com/en-us/develop/overview/.
  • Database Management Services (DBMS)
    Both platforms offer rich database management functionality.
    Apprenda leverages SQL Server to offer relational data storage functionality for applications and enables lot of features on the data tier like resource throttling, data sharding and multi-tenancy. Apprenda is working to also deliver easy integration with popular no-SQL databases on a provider basis in its next version. This will allow your applications to leverage the functionality of MongoDB, Kasandra and others as well as imporved platform support like automatic data sharding.
    Windows Azure Database is the RDBMS analogue on Azure side. Unlike Apprenda though Windows Azure Database limits the databases to certin pre-defined sizes and requires you to handle the data sharding in your application. Windows Azure Storage offers proprietary no-SQL like functionality for applications that require large sets of semi-structured data.
  • Business Process Management Technologies
    At this point of time neither Apprenda nor Windows Azure offer build-in business process management technologies. However applications on both platforms can leverage Biztalk Server and Windows Workflow Foundation for business process management.
  • Application Lifecycle Management Tools
    Both Apprenda and Windows Azure offer distinct features that help you through your application lifecycle and allow multiple versions of your application to be hosted on the platform.
    Applications deployed on Apprenda go through the following phases:
    • Definition - this phase is used during the initial development phase of the application or a version of the application
    • Sandbox - this phase is used during functional, stress or performance testing of the application or application version
    • Production - this phase is used for live applications
    • Archived - this phase is used for older application versions

    In addition Apprenda stores the binaries for each application version in the repository so that developers can easily keep track of the evolution of the application.
    If you use Windows Azure cloud services the support for application lifecycle includes the two environments that you can choose from (Staging and Production) and the convinient way to easily switch between those (a.k.a VIP-Swap) as well as the hosted version of TFS that you can use to version, build and deploy your application. If you use Windows Azure web sites you also has the opportunity to use Git for pushing your application to the cloud. Keep in mind that at the time of this writing the TFS service is in Preview mode (and hance still free) and in the future it will be offered as paid service in the cloud. 

  • Application Governance Tools (including SOA, interface governance and registry/repository)
    At the moment neither of the platforms offers central repository of services but as mentioned above there are easily integrated with Biztalk.
    Using intelligent load-balancing both platforms ensure the DNS entries for the service endpoints are kept consistent so you don't need to reconfigure your applications if any of the servers fail.
  • Messaging and Event Processing Tools
    Apprenda and Windows Azure significantly differentiate in their messaging and event processing tools.
    Apprenda offers event processing capabilities in a publish-subscribe mode. Publisher applications can send events either at application or platform level and subscriber applications can consume those. Apprenda ensures that the event is visible only at the required level (application only or cross platform) and it doesn't require any additional configuration.
    Windows Azure offers several ways for messaging. ServiceBus Queues offer first-in-first-out queueing functionality and guarantees that the message will be delivered. ServiceBus Topics offer publish-subscribe messaging functionality. Windows Azure Queues is another Windows Azure service that offers similar capabilities where you can send a message to a queue and any application that has access to the queue can process it. Whether you use ServiceBus or Windows Azure Queues though you as developer are solely responsible for ensuring the proper access restrictions to your queues in order to avoid unauthorized access. Keep in mind that all Windows Azure services are publicly available and the burden of securing those lies on you.
  • Business Intelligence Tools and Business Activity Monitoring Tools
    At this point of time both platforms have no build-in business intelligence or activity monitoring functionality.
  • Integrated Applicaiton Development and Lifecycle Tools
    Because both platforms target .NET developers you can assume good integration with Visual Studio.
    Windows Azure has a rich integration with Visual Studio that allows you to choose from different project templates, build Windows Azure deployment archives, deploy and monitor the deployment progress from within Visual Studio.
    Apprenda as well offers Visual Studio project templates for applications using different Apprenda services as well as external tool that allows you to build deployment archive by pointing it to a Visual Studio solution file. Unlike Windows Azure package format though Apprenda's deployment package is open ZIP format and has very simple folder structure, which allows you to use any ZIP tool to build the package. In the next version of Apprenda SDK you will see even better Visual Studio integration that comes at parity of what Windows Azure has to offer.
  • Integrated self-service management tools
    As mentioned above both platforms offer self-service web portals for developers. Apprenda also offers similar portals for platform owners and users as well.
    On the command-line front Apprenda offers Apprenda Command Shell (ACS) that allows developers the ability to script their build, packaging and application deployment.
    Similarly Windows Azure SDK offers a set of Power Shell scripts that connect to Windows Azure management APIs and allow you to deploy, update, scale out/scale back etc. your application.

Now, that we have looked very thoroughly through the above bullet points from Gartner's Reference Model for PaaS you may think that there are a lot of simlarities between the two platform and wonder why should you use one versus the other. Hence it is time to look at the differences in more details.

  • Public vs. Private
    One of the biggest differences between Windows Azure and Apprenda is that they both are targeting complementary areas of the cloud computing space.
    As you may already know Windows Azure is public offering hosted by Microsoft and so far there is no offering from Microsoft that enables Azure like functionality in your own datacenter (DC).
    Apprenda on the other side is a software layer that you can install on any Windows infrastructure and turns this infrastructure into a Platform as a Service. Although Apprenda is mainly targeted to private datacenters it does not prevent you from installing it on any public infrastructure like Windows Azure IaaS, Amazon AWS, Rackspace etc. Thus you can use Apprenda to enable PaaS functionality similar to the Windows Azure one either in your datacenter or on a competitive public infrastructure.
  • Shared Hardware vs Shared Container
    One other big difference between Windows Azure and Apprenda is how the platform resources are managed.
    While Windows Azure spins up new Virtual Machine (VM) for each application you deploy (thus enabling you to share the hardware among different applications) Apprenda abstracts the underlying infrastructure even more and presents it as one unified pool of resources for all applications. Thus in the Apprenda case you are not limited to the one-to-one mapping between application and VM and you can deploy multiple applications on the same VM or even bare metal. The shared container approach that Apprenda uses allows for much better resource utilization, higher application density and true multi-tenancy then the app-to-VM one.
    One note that I need to add here is that with the introduction of Windows Azure web sites you can argue that Windows Azure also uses the shared container approach to increase the applicaiton density. Howeve Windows Azure web sites is strictly constraned to applications that run in IIS while Apprenda enables this functionality throughout all applicaiton tiers including services and data.
  • Legacy vs. New Applications
    One of the biggest complaints in the early days of Windows Azure was the support for legacy applications and the ability to migrate those to the cloud. Ever since Microsoft is trying to add functionality that will make the migration of such applications easier. Things significantly improved with the introduction of Windows Azure Infrastructure-as-a-Service (IaaS) but on the PaaS front Azure is till behind as you need to modify your application code in order to run it in Azure Web or Worker role.
    Migrating legacy application to Apprenda on the other side is much easier and in the majority of the cases the only thing you need to do is to repackage the binaries into an Apprenda archive and deploy them to the platform. As added bonus you get free support for authentication and authorization (AutH/AutZ) and multi-tenancy even if your application wasn't developed with those functionalities in mind.
  • Billing Support
    The last comparison point I want to touch on is the billing support on both platforms.
    As you may be aware ISVs are having hard time implementing different billing methods on Windows Azure because there are no good ways to tap into the billing infrastructure of the platform - there are no standard APIs exposed and the lag for processing billing data is significant (24h normally)
    Apprenda in contrast is implemented with the ISVs in mind and offers rich billing support that allows you to implement charge backs on functionality level (think API calls) as well as on resource level (either allocated or consumed). This allows developers to implement different monetization methods in their applications - like charging per feature, charging per user or per CPU usage for example (the latter is similar to Google AppEngine).

By now you should have very good understanding of the similarities and differences between Windows Azure and Apprenda. I bet that you already have good idea where can you use one versus the other. However I would like to throw at you few ideas where you can use both together to get the best of both in your advantage. Here are couple of use cases that you may find useful in your arsenal of solutions:

  • Burst Into the Cloud
    With the recent introduction of Windows Azure IaaS and Windows Azure Virtual Network (both still in Beta) you are not anymore limited to the capacity of your private datacenter. If you add Apprenda into the mix you can create unified PaaS layer on top of hybrid infrastructure and allow your applications to burst into the cloud when demand increases and scale back when it decreases. 
    There are several benefits you get from this.
    First, your development teams don't need to implement special code in their applications that runs conditional on where the applicaiton is deployed (in the private DC or in the cloud). They continue to develop the applicaitons as they are deployed on a stand-alone server, then they use Apprenda to abstract the applications from the underlying infrastructure.
    Second, the IT personel can dynamically manage the infrastructure and add capacity without the need to procurr new hardware. Thus they are not the bottleneck for applicaitons anymore and become enabler for faster time-to-market.
  • Data Sovereigncy
    For lot of organizations putting data in the public cloud is still out of questions. Hospitals, pharaceutical companies, banks and other financial institutions need to follow cetrain regulatory guidelines to ensure the sensitive data of their customers is well protected. However such organizations still want to benefit from the cloud. Thus using Apprenda as PaaS layer spnning your datacenter and Windows Azure IaaS you can ensure that the data tier is kept in your own datacenter while the services and front-end can scale into the cloud.
  • Easy and Smooth Migration of Legacy Apps to the Cloud
    With the build-in support for legacy applications Apprenda is a key stepping stone into the migration of those applications to Windows Azure. Using hybrid infrastructure (your own DC plus Windows Azure IaaS) with Apprenda PaaS layer on top you can leverage the benefits of the cloud for applications that will need substantial re-implementation in order to run on Azure.
  • Achieve True Vendor Independence
    The last but not least is that by abstracting your applications from your infrastructure with Apprenda's help you can achieve true independence from your public cloud provider. You can easily move applications between your own datacenter, Windows Azure, AWS, Rackspace and any other provider that offer Windows hosting. Even better, you are able to easily load ballance between instances on any of those cloud providers and ensure that if one has major failure your application continues to run uninterrupted.

I am pretty sure this post doesn't evaluate all possible features and capabilities of both platforms but I hope it gives you enough understanding of the basic differences of the platforms and how you can use them together. Having in mind that Apprenda is a close partner of Microsoft we are working to bring both platforms together. As always questions, feedback and your thoughts are highly appreciated.

August 23, 2012

Converting Single-Tenant to Multi-Tenant Apps

Characteristics of Successful SaaS Application

Scott Chate, the VP or Product at Corent Technologies very well describes the characteristics of a successful SaaS application in hist post Convert your Web Application to a Multi-Tenant SaaS Solution from 2010. As per his post successful SaaS application must possess the following characteristics.

  • It must support multi-tenancy
  • It must offer self-service sign-up
  • It must have subscription and billing mechanisms in place
  • It must scale efficiently
  • It must support monitoring and management of tenants
  • It must support user authentication and authorization for each tenant
  • It must support tenant customization

In order to achieve true multi-tenancy, which also allows the highest efficiency your application should be able to share the database and the application logic among tenants. 

However what does this mean for application developers. 

Database Redesign

The first step in the application redesign is the introduction of tenant identifier column in each database table and view. The tenant identifier is used to filter the data that belongs to a particular tenant. This has several implicatioins for the application developers:

  • All database scripts need to be changed so they can include the tenant idetifier. This includes creation scripts, updates to primary and foreign keys, stored procedures etc. For example if you have an order processing application and you used the order number as primary key you need to make sure that now the primary key includes also the tenant ID. Thus two different tenants can have the same order numbers if their policies require it.
  • As part of the database redesign you need to update the indices on all tables so that these take into account the tenant id. This will make sure database queries that reuire tenant specific information are executed with the necessary performance in mind.
  • Next you need to update all database queries made at the business logic tier and the tenant identifier. This has direct impact on the source sode and depending on how well your application is architected this may be relatively easy or hard to do. If for example there is no designated data access layer and SQL queries are hardcoded and spread all across the code, changing those will be a nightmare.
  • Last but not least you need to think how you can scale the database tier. Now that you store data from multiple tenants in the same database the chances are that you will reach the limit much faster than when you have separate database for each tenant. You need to think how to shard the data, and whether you will do this at the application tier or at the data tier.

Security

The next big topic you need to consider during the redesign process is the security. Although it is always about securing the data there are two aspects here:

  • Security at runtime
  • Security at the data tier

In the true-multitenancy case the business logic code is shared among multiple tenants. What that means is that the users from different tenants will be handled by the same code running not only on the same machine but even in the same process on that machine. In order to ensure that users from particular tenant never see the data of other tenants you need to be much more diligent about security.

Let's look at a particular scenario. Imagine that you have a mortgage calculator that calculates the monthly payments for a customer based on the principal amount of the loan and the length of the loan supplied by the customer, and the interest rate that you read from the database. Because the interest rate does not change very often and is the same for every customer you may be tempted to cache this in a static field in your application. This may work OK for a single-tenant application but if you want so have multiple banks using your application in a multi-tenancy scenarios it will be disastrous. The issue is that you cannot assume that all banks will offer the same interest rate to their customers and the code that reads the interest rate from the database will overwrite the static varieble for each tenant. In this case you will not only provide the end user with misleading information but will also expose competitive information to the rest of the tenants.

As we already discussed, on the data tier each tenant must be uniquely identified when accessing the data. You may want to create different logins for each tenant and give them permissions to just their view of the data or you may want to restrict the access to it by special WHERE clause to achieve the same. And of course each tenant may have different access permissions for users from different roles, so you will need to keep the user authorization code from your single-tenant app (maybe with some modifications).

Last but not least data access auditing is even more important for multi-tenant applications than for single-tenant ones. Now you need to keep track not only of which user accessed the data but to which tenant this user belongs to in order to be able to trace back any unauthorized access.

 

Scale and Performance

 

I've already touched a bit on this topic in the Database Redisign section when I discussed the need for data sharding but there are other things that you need to consider when you are converting your application to multi-tenant one.

One of them is the diverse set of tenants you may have. If we take the previous example, the mortgage calculator may be used by banks from any size - like small local banks and credit unions with just few thousand clients and by big banks with millions of clients. In a multi-tenant environment you cannot expect that each tenant will be the same size and you need to make sure that your application is able to serve them equally, and it is easy to scale out and in when the need arises. As part of the application design you need to take care of things like:

  • Throttling the request of demanding tenants. Some times scaling out your application may require some time and it can vary from couple of seconds to tens of minutes or even may require manual intervention. In the mean time if your application is not able to throttle the requests from the one tenant that consumes all the resources you other tenants may be down. Hacker attacks or security issues may also be the reason for such spikes in particular tenant's activity.
  • Avoiding code that stores the session state in memory on the server side. If you suddenly need to scale your application out the odds are that the next request from the user may not land on the same server and if the session state is stored in memory then they will lose all that information. You need to make sure that such state is stored either on the client size (browser cookie or local browser storage) or in a shared location like database. Although this one is true for every cloud application, not only multi-tenant ones, you need to keep in mind that scale out scenario is much more common in multi-tenant applications.
  • Gracefully hadle errors. Lot of things can go wrong when your application is under heavy load. Timeouts, session data loss, connectivity loss are just few of the causes for errors. You need to make sure that such fault scenarios are easy to recover from as well as on the server also on the client side.

Those are just some of the design considerations for multi-tenant applications. There are certainly platforms (like my current employer's Apprenda) that will do most of the work for you when you migrate your applications to multi-tenant ones, however you still need to be aware of possible areas where such automatic conversion cannot be done. Taking a closer look at your code is always necesary in conjunction with the automation platforms.

February 14, 2012

Building Applications for the Cloud Slides

Last week I did a talk at a local event organized by Northwest Entrepreneur Network where I outlined the different cloud offerings available today and gave some tips for developing cloud applications. Lot of the people attending were interested in the slides hence I am posting those on my blog also for other people to look at.

Here is a download link for offline use (and to see the complete animations that are missing from the SlideShare version:))

Download Building Applications For The Cloud slides (5644.8K)

 

January 25, 2012

Accessing Windows Azure REST APIs with cURL

Tonight I was playing with cURL on my Mac wondering how easy would it be to develop few scripts to manage Windows Azure applications from non-Windows machine. As it turns out getting access to Windows Azure REST APIs was quite simple. Here are the steps I had to go though in order to be able to receive valid response from the APIs:


Set up Windows Azure management certificate from your Mac machine

The first thing I had to do is to create a self signed certificate that I can use to do the Service Management. Creating the cert with openssl (which is available on Mac) is quite simple - just type:


openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout azure-cert.pem -out azure-cert.pem


During the creation openssl will ask you for all the necessary information like country name, organization name etc. and at the end will generate .pem file that contains the public and the private key.

In order to upload the certificate to your Windows Azure subscription using the Management Portal though you need to have the certificate in PKCS12 (or .pfx) format. Here is the openssl command that will do the work:


openssl pkcs12 -export -out azure-cert.pfx -in azure-cert.pem -name "My Self Signed Cert"


Now that you have the PKCS12 file you can go ahead and upload this to your Management Certificates using the portal.


Update: By writing this in the middle of the night I totally messed up what you need to do. PKCS12 you need if you want to enable SSL for your service. For management you only need the public key that you can export in .CER file. Here is the command that you use for this:


openssl x509 -outform der -in azure-cert.pem -out azure-cert.cer


Now you can upload the .CER to the Management Certificates section using the portal.


Windows Azure Management Certificates - Management Portal Screenshot


The initial set-up is done!


Using cURL to Access Windows Azure REST APIs

Now that you have the cert created and uploaded to Windows Azure you can easily play with the REST APIs. For example if you want to list all your existing hosted services you can use the List Hosted Services API as follows:


curl -E [cert-file] -H "x-ms-version: 2011-10-01" "https://management.core.windows.net/[subscr-id]/services/hostedservices"

where:

  • cert-file is the path to the .pem file containing the certificate
  • subscr-id is your Windows Azure subscription ID

Don't forget to specify the version header (the -H flag for cURL) else Windows Azure will return an error. As a result of the call above you will receive XML response with list of all the hosted services in your Windows Azure subscriptioin.


You can access any of the REST APIs by manually constructing the request and the URL as described in the Windows Azure Service Management REST API Reference.


I didn't get to any of my planned scripts but I can explore the APIs easily cURL.


September 11, 2011

Demystifying physicalDirectory or How to Configure the Site Entry in the Service Definition File for Windows Azure

If you played a bit more with the sites configuration in Windows Azure you may have discovered some inconsistent behavior between what Visual Studio does and what the cspack.exe command line does when it relates to physicalDirecroty attribute. I certainly did! Here is the problem I encountered while trying to deploy PHP on Windows Azure.

 

Project Folder Structure

I was following the instructions on Installing PHP on Windows Azure leveraging Full IIS Support but decided to leverage the help of Visual Studio instead building the package by hand. Not a good idea for this particular scenario :( After creating my cloud solution in Visual Studio I ended up with the following folder structure:

+ PHPonAzureSol

     + PHPonAzure

          …

          - ServiceConfiguration.cscfg

          - ServiceDefinition.csdef

     + PHPRole

          …

          + bin

               - install-php.cmd

               - install-php-azure.cmd

          + PHP-Azure

               - php-azure.dll

          + Sites

               + PHP

                    - index.php

          + WebPI-cmd

               …

 

Where:

  • PHPonAzureSol was my VS solution folder
  • PHPonAzure was my VS project folder containing the CSDEF and CSCFG files
  • and PHPRole was my VS project folder containing the code for my web role

The PHPRole folder contained the WebPI command line tool needed to install PHP in the cloud stored in the WebPI-cmd subfolder; the PHP extensions for Azure in the PHP-Azure subfolder; the installation scripts in the bin subfolder; and most importantly my PHP pages in Sites\PHP subfolder (in this case I had simple index.php page containing phpinfo()).

 

Configuring Site Entry in the CSCFG File

Of course my goal was to configure the site to point to the folder where my PHP files were stored. In this particular case this was the PHPonAzureSol\PHPRole\Sites\PHP folder if you follow the structure above. This is simply done by adding the physicalDirectory attribute to the Site tag in CSDEF. Here is how my Site tag looked like:

 

<Site name="Web" physicalDirectory="..\PHPRole\Sites\PHP">
     <Bindings>
          <Binding name="Endpoint1" endpointName="Endpoint1" />
     </Bindings>
</Site>

 

 

My expectation was that with this setting in CSDEF IIS will be configured to point to the content that comes from the physicalDirectory folder. Hence if I type the URL of my Windows Azure hosted service I should be able to see the index.php page (i.e. http://[my-hosted-service].cloudapp.net should point to your PHP code).

 

Visual Studio handling of physicalDirectory attribute

Of course when I used Visual Studio to pack and deploy my Web Role I was unpleasantly surprised. It seems Visual Studio ignores the physicalDirectory attribute from your CSDEF file, and points the site to your Web Role’s approot folder (or the content from PHPRole folder if you follow the structure above). Thus if I wanted to access my PHP page I had to type the following URL:

 

http://[my-hosted-service].cloudapp.net/Sites/PHP/index.php

 

Not exactly what I wanted :(

The reason for this is that Visual Studio calls cspack.exe with additional options (either /sitePhysicalDirectories or /sites) that overwrite the physicalDirectory attribute from CSDEF. As of now I am not aware of a way to change this behavior in VS.

Update (9-12-2011): As it seems VS ignores the physicalDirectory attribute ONLY if your web site is called Web (i.e. name="Web" as in the example above). If you rename the site to something else (name="PHPWeb" for example) you will end up with the expected behavior described below. Unfortunately name="Web" is the default setting, and this may result in unexpected behavior for your application.

 

cspack.exe handling of physicalDirectory attribute

Solution to the problem is to call cspack.exe from the command line (without the above mentioned options of course:)).

There are few gotchas about how you call cspack.exe using the folder structure that Visual Studio creates. After few trial-and-errors where I received several of those errors:

 

Error: Could not find a part of the path '[some-path-here]'.

 

I figured out that you should call cspack.exe from the solution folder (PHPonAzureSol in the above structure). Once I did this everything worked fine and I was able to access my index.php by just typing my hosted service’s URL.

 

How physicalDirectory attribute works?

For those of you interested how the physicalDirectory attribute works here is a simple explanation.

MSDN documentation for How to Configure a Web Role for Multiple Web Sites points out that physicalDirectory attribute value is relative to the location of the Service Configuration (CSCFG) file. This is true in the majority of the cases however I think the following two clarifications are necessary:

  1. Because the attribute is present in the Service Definition (CSDEF) file the correct statement is that physicalDirectory attribute value is relative to the location of the Service Definition (CSDEF) file instead. Of course if you use Visual Studio to build your folder structure you can always assume that the Service Configuration (CSCFG) and the Service Definition (CSDEF) files are placed in the same folder. If you build your project manually you should be careful how you set the physicalDirectory attribute. This is of course important if you want to use relative paths in the attribute.
  2. This one I think is much more important than the first one, and it states that you can use absolute paths in the physicalDirectory attribute. The physicalDirectory attribute can contain any valid absolute path on the machine where you build the package. This means that you can point cspack.exe to include any random folder from your local machine as your site’s root.

Here is how this works.

What cspack.exe does is to take the content of the folder configured in physicalDirectory attribute and copy it under [role]/sitesroot/[num] folder in the package. Here is how my package structure looked like (follow the path in the address line):

 

image

 

During deployment IIS on the cloud VM is configured to point the site to sitesroot\[num] folder, and serve the content from there. Here is how it is deployed in the cloud:

 

image

 

And here is the IIS configuration for this cloud VM:

 

image

August 30, 2011

Microsoft Windows Azure Development Cookbook Review

Recently I got my hands on the Microsoft Windows Azure Development Cookbook book written by Neil Mackenzie (@mknz), and honestly I was impress with the quality of the examples he gives into it. As a disclaimer I have to say that the book was sent to me by the publisher however there are no incentives for me to write this review. Once again I think the book provides very good hands on examples for the complete set of services that Windows Azure offers – starting with Windows Azure Storage, going through the Hosted Services and ending with SQL Azure and AppFabric.

The value I saw in the book is that 1.) it explains each of the services with real examples, and walks you step by step through what is happening, and 2.) is written from the customer point of view (Neil is a MVP, who is dealing with Windows Azure since it was announced at PDC08). I personally found the following three chapters very valuable even for me:

  • Using the Shared Access  Signature for a container blob
  • Using the retry policies with blob operations
  • Autoscaling with the Windows Azure Service Management REST API

I am sure that I will refer to those three chapters very often for my own work on Windows Azure.

As a suggestion I would like to say to Neil that in the next edition I would be happy to see a section dedicated to the troubleshooting, debugging and profiling services on Windows Azure. From my own experience I can say that the majority of question I’ve seen have been in this area, and his experience with the platform as well as his independent view will allow him to provide really good troubleshooting tips and helpful workarounds.

I know Neil remotely, and we have interacted couple of time on MSDN forums or via Twitter. I can say that he is very knowledgeable about the topic, and you can learn a lot from the book and his personal blog.

 

I hope you will find the book a good resource, and kudos to Neil for writing it!

June 21, 2011

Configuring Tomcat Logging

If you looked at my recent posts I was playing with Java and Tomcat a lot, and trying to run those on Windows Azure. One of the things I wanted to achieve is to store Tomcat log files if folder different than the default Tomcat location. Surprisingly for me configuring Tomcat logging turned out to be not so intuitive. Let’s start with the basics…

 

Where are Tomcat Logs Stored?

By default Tomcat stores the log files under
$CATALINA_BASE\logs

Where CATALINA_BASE is the folder where Tomcat is installed. If you open that folder you will see something like this:

 

06/21/2011  02:49 PM  7,534 catalina.2011-06-21.log
06/21/2011  01:37 PM      0 host-manager.2011-06-21.log
06/21/2011  02:49 PM  1,872 localhost.2011-06-21.log
06/21/2011  02:49 PM      0 localhost_access_log.2011-06-21.txt
06/21/2011  01:37 PM      0 manager.2011-06-21.log

 

For more information what each file is about you can read the Tomcat Logging page.

My goal was to move those log files to a folder different from

$CATALINA_BASE\logs

 

How to Configure Tomcat Logging (Really How)?

If you search Google (or Apache’s web site) you will find out that in order to configure Tomcat logging you will need to either:

  • edit the logging.properties file in $CATALINA_BASE\logs
  • or create new logging.properties and set the java.util.logging.config.file System property to point to it

The easiest way to use the second approach is to set the Environment Variable


LOGGING_CONFIG=”-Djava.util.logging.config.file=[your_logging.properties_file_location]”

 

As you may expect the default logging.properties file is located in $CATALINA_BASE\conf.


Now, the hard part with this is that you CANNOT use Environment Variables in Java properties file. And of course this was what I really wanted to do. In general what I wanted to do is to use the %ROLEROOT% Environment Variable in the location path for all the log files (see What Environment Variables Can You Use in Windows Azure). The workaround to this problem is to set Java System property to use the Environment Variable (ie. the –D option for java.exe). Tomcat startup scripts use Environment Variable JAVA_OPTS for exactly this purpose:

 

set JAVA_OPTS=-DMY_SYSTEM_PROPERTY=%MY_ENVIRONMENT_VARIABLE

 

For Windows Azure specifically you can use the Variable tag in CSDEF:

 

<Variable name="JAVA_OPTS" value="-D[my_property_name]=%ROLEROOT%\[some_folder]" />

 

Next, in order to use the System property in the Java properties file you need to specify it in the following format:

 

${[my_property_name]}

 

Here is what I actually did. In CSDEF you set the Environment Variables as follows:

 

<Environment>
    <Variable name="TomcatLocalResourcePath"
              value="%ROLEROOT%\Approot\temp" />
    <Variable name="JAVA_OPTS" value="-DTomcatLocalResourcePath=%
              TomcatLocalResourcePath%" />
</Environment>

 

and in the logging.properties you use the Java System property as follows:

 

1catalina.org.apache.juli.FileHandler.directory = ${TomcatLocalResourcePath}

2localhost.org.apache.juli.FileHandler.directory = ${TomcatLocalResourcePath}

3manager.org.apache.juli.FileHandler.directory = ${TomcatLocalResourcePath}4host-manager.org.apache.juli.FileHandler.directory = ${TomcatLocalResourcePath}

 

This is all good, however it takes care only of the following log files:

 

catalina.2011-06-21.log
host-manager.2011-06-21.log
localhost.2011-06-21.log
manager.2011-06-21.log

 

What about localhost_access_log.2011-06-21.txt? The access log file in Tomcat is not configured via logging.properties file but in server.xml file. You can read more about the Access Log Valve (which controls the access log) on Apache’s web site. The simple thing that you need to do is to set the directory attribute on the Valve tag as follows:

 

<Valve className="org.apache.catalina.valves.AccessLogValve"
               directory="${TomcatLocalResourcePath}" 
               prefix="localhost_access_log." suffix=".txt"
               pattern="%h %l %u %t &quot;%r&quot; %s %b"
               resolveHosts="false"/>

 

UNIX vs. Windows vs. Java Property Files

As a final note some clarification on when to use dollar sign $, percent % and dollar sign with curly braces ${} as I think it may be confusing for some people:

  • As you know dollar sign $ is used to evaluate Environment Variables in UNIX. For example if you define the following Environment Variable in UNIX:

    setenv MYTEMPPATH /usr/temp

    you can use it later on as follows:

    setenv SOMEPATH $MYTEMPPATH/new
  • In contrast Windows uses percent % to evaluate Environment Variables. Here is the same example for Windows:
    set MYTEMPPATH=C:\Temp

    and you can use it as follows:

    set SOMEPATH=%MYTEMPPATH%\new

  • Property files in Java use the UNIX type of format but with curly braces to evaluate System properties. For example if you define the System property as follows:

    -DMyTempPath=C:\Temp

    you can use it in Java properties files as follows:

    some.property=${MyTempPath}