Building a hybrid SQL Server infrastructure

The geographic distribution of cloud data centers makes it easy to configure SQL Server instances for high availability (HA) and/or disaster recovery (DR), but what good is that to you if you’re already invested in a well-tuned SQL Server system that’s firing away on all cylinders on-premises? The idea of ensuring higher levels of availability or easily adding a disaster recovery site may sound very attractive, but the thought of moving everything into the cloud to achieve those ends may give rise to cold sweats and nightmares.

But here’s the thing: The cloud is not an all-or-nothing proposition. You can mix on-prem and cloud infrastructures to create a hybrid infrastructure. Some parts of your solution will be on-prem, while other parts will be in the cloud. Ensuring that your hybrid infrastructure can deliver the availability support that you want, though, requires an awareness of both the differences between your on-prem and cloud configuration options as well as an understanding of how to bridge those differences.

Feet on the ground, head in the clouds

Let’s assume you want to continue to use your on-prem SQL Server configuration as your primary SQL system. It ain’t broke, as they say, so why move it? But if you want to protect your organization from operational downtime or infrastructure loss that might occur in the event of a catastrophe—a hurricane, for example, or a major earthquake that compromises your on-prem infrastructure—you can put another instance of your SQL Server configuration in the cloud, where it is unlikely to be affected by the local disaster. If a catastrophe takes your on-prem infrastructure offline, the infrastructure in the cloud can take over.

One way to look at the cloud, in this scenario, is as if it were a remote data center in which you have a standby server, ready to take over in a moment’s notice. There are two fundamental ways to do this. You could configure a multi-node Always On Availability Group (AG), a feature of SQL Server itself, or you could use Windows Server Failover Clustering (WSFC) to configure a multi-node SQL Server failover cluster instance (FCI).

While the former approach is specific to SQL Server and the latter is generic to Windows Server, both approaches enable you to deploy SQL Server on cluster nodes running in geographically separate locations and both orchestrate failover from a primary node (on premises, in this scenario) to a secondary node (in the cloud) in the event of a catastrophe.

Hybrid cluster configuration challenges

The challenges you’ll encounter if you view your hybrid architecture this way are twofold. First, you need to ensure that the cloud-based instance of SQL Server can access the data that your on-prem instance has been using, and how you do that depends on the approach you take.

If you’re using an AG, the AG can be configured to replicate your user-defined SQL databases from the primary (on-prem) instance of SQL Server to your secondary (cloud-based) instance of SQL Server. However, if you’re using the Basic AG feature of SQL Server Standard Edition you are limited to replicating from one node to a second (and you are limited to one database per AG). If your on-prem architecture already has two nodes configured as a Basic AG, your cloud-based instance of SQL Server would represent a third node, which Basic AGs do not support. To replicate your on-prem data to that cloud-based instance you’ll have to upgrade your on-prem and cloud-based instances of SQL Server to the Enterprise Edition (2012 or later). That’s a costly upgrade, but that’s the only way you can use AG to support replication of user-defined databases among more than two SQL Server instances.

If you’re using WSFC to create a hybrid cluster, you’ll need to think about storage in a different way. Traditionally, Windows Server failover clusters share data on a storage area network (SAN), which all the individual cluster nodes can access. No matter which node in the cluster is running the active instance of SQL Server, it can interact with the databases residing on the SAN. However, there’s no option to share a SAN in the cloud. You can configure a hybrid cluster that binds on-prem and cloud-based infrastructure, but the cloud-based VMs cannot interact with a shared on-prem SAN. Nor can a SAN be put in the cloud to be shared among cloud-based compute nodes. Every failover cluster node in the cloud must be configured with attached storage of its own, and you’ll need to deploy a mechanism to replicate the data from your on-prem storage to the storage attached to each VM running in the cloud.

The solution to this challenge is to build a SANless failover cluster using SIOS DataKeeper. SIOS DataKeeper performs block-level replication of all the data on your on-prem storage to the local storage attached to your cloud-based VM. If disaster strikes your on-prem infrastructure and the WSFC fails SQL Server over to the cloud-based cluster node, that cloud-based node can access its own copy of your SQL Server databases and can fill in for your on-prem infrastructure for as long as you need it to.

One other advantage afforded by the SANless failover cluster approach is that there is no limit on the number of databases you can replicate. Where you would need to upgrade to SQL Server Enterprise Edition to replicate your user databases to a third node in the cloud, the SANless clustering approach works with both the SQL Server Standard and Enterprise editions. While SQL Server Standard Edition is limited to two nodes in the cluster, DataKeeper allows you to replicate to a third node in the cloud with a manual recovery process. With Enterprise Edition the third node in the cloud can simply be part of the same cluster.

Inter-node communications in a hybrid environment

In addition to addressing the challenge of how to synchronize SQL Server data between your on-prem and cloud-based infrastructures, you need to address the challenge of synchronizing your data securely and expeditiously. Since one of the purposes of a DR infrastructure is to shield you from loss arising from a local disaster, you should plan to locate your DR infrastructure in a cloud data center that’s unlikely to be affected by any localized disaster that would affect your on-prem infrastructure. Such distances are going to require you to use asynchronous replication settings whether you are using an AG approach or a SANless clustering approach, as the distances between the data centers are going to be too great to support synchronous replication.

Still, to ensure that your on-prem and cloud-based instances are as close to synchronized as possible, you want to connect them using the fastest network channel you can afford to acquire. AWS, Azure, and Google all offer direct cloud connections as premium options (AWS Direct Connect, Azure Express Route, and GCP Cloud Interconnect). These options provide you with secure, high-speed ways to bypass the public Internet when communicating with your cloud-based infrastructure, but they add further cost to your configuration. If you opt not to use one of these direct connections, you’ll need to configure a VPN channel to secure your connection to the remote infrastructure. Hence your ability to synchronize your data will be gated by the speed of your Internet gateway and the overhead of the VPN you’re using.

Note, though, that no matter the speed of your network connection, if you’re using asynchronous replication your on-prem and cloud-based instances may rarely be in perfect harmony. On a busy day, your cloud-based SQL databases may always be slightly behind your active on-prem database. That is not a shortcoming of a hybrid architecture; it’s just the nature of asynchronous replication between any distant points.

As a consequence, if there’s a catastrophic incident that prompts an unanticipated failover to your cloud infrastructure, your cloud-based instance of SQL Server may come online without some of the transactions and database updates that had already been committed in the former primary instance. If at some point you can recover the data from the storage associated with the on-prem instance of SQL Server, you may be able to recover and restore those transactions. If not, they may be lost.

Hybrid configuration considerations beyond SQL Server

The failover management components built into AGs and WSFC can ensure a SQL Server failover that is as speedy and as seamless as possible in any disaster scenario. They can orchestrate interactions and updates with ancillary infrastructure components such as load balancers and DNS servers, enabling you to have your cloud instance of SQL Server fully operational, and with minimal data loss, in moments.

However, when you’re running SQL Server on-premises and configuring a hybrid architecture for DR, there are additional operational elements you should take into consideration. How you configure elements such as application or terminal servers that may ultimately interact with SQL Server is beyond the scope of this article, but we’ll leave you with a checklist of things to consider when envisioning a hybrid architecture designed for DR.

Different disasters have different scopes. An isolated fire in the data center could take your on-prem SQL Server infrastructure offline but leave your application servers and user workstations untouched. An electrical grid issue could take your entire data center offline, making both SQL Server and your application server infrastructure inaccessible, while user workstations in another location remain unaffected. A hurricane could devastate the region, bringing down your SQL Server, applications server, and the workstation infrastructure. These are just examples, for each environment is unique, but each of these scenarios calls for a distinct DR response.

DR scenario 1: SQL Server alone goes offline. Data center, application servers, and workstation infrastructure are unaffected.

DR response:

  • Failover SQL Server to the cloud infrastructure as outlined.
  • Redirect client connections from applications servers to cloud-based SQL infrastructure.

DR scenario 2: The entire data center goes offline, rendering SQL Server and your application servers inaccessible. User workstations are unaffected.

DR response:

  • Failover SQL Server to the cloud infrastructure as outlined.
  • Recover applications servers in the cloud infrastructure using tools such as Azure Site Recovery, AWS CloudEndure, or a third-party product such as Veeam.
  • Redirect client connections to the cloud infrastructure.

DR scenario 3: The entire data center and the user workstation infrastructure go offline.

DR response:

  • Failover SQL Server to the cloud infrastructure as outlined.
  • Recover application servers in the cloud infrastructure using tools such as Azure Site Recovery, AWS CloudEndure, or a third-party product such as Veeam.
  • Redirect client connections to the cloud infrastructure, as in DR scenario 2.
  • Enable users to work from home using remote desktop services running on cloud-based infrastructure.

In the end, if your on-prem infrastructure is working for you, run with it. You can take advantage of specific benefits afforded by the cloud—like its ability to provide a DR safety net far more easily than you could build one yourself—without having to move everything into the cloud. Just build that safety net properly and be sure you test it regularly to be sure it can catch you if your on-prem infrastructure takes an unanticipated fall.

Dave Bermingham is the senior technical evangelist at SIOS Technology. He is recognized within the technology community as a high-availability expert and has been honored to be elected a Microsoft MVP for the past 11 years: six years as a Cluster MVP and five years as a Cloud and Datacenter Management MVP. Dave holds numerous technical certifications and has more than 30 years of IT experience, including in finance, healthcare, and education.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to

Posted by Contributor