4 Reasons Why Your Database Should Be Multi-Region

DataStax
6 min readOct 5, 2021

--

By Krithika Balagurunathan

Image: Pixabay

Successfully competing in today’s market is hard enough without having to worry about whether a critical application — your mobile app if you’re a retailer or a bank, or a shipping provider tracking packages and vehicles — will go down. It’s become an expectation that apps are available and responsive, anytime and anywhere.

Redundancy is a key way to address these demanding requirements. You could build multiple instances of a stateless application to ensure availability, but doing so won’t make much of a difference if they’re coupled with a single database instance. A better approach is deploying your app and database across multiple data centers, then synchronizing the database instances to survive a data center outage.

But this won’t solve all your potential problems. Users that are connected to a region or data center that has failed can’t access your services or buy your goods. You could reroute the application to a database in a working region, but this would inhibit purchases because they couldn’t make writes that are needed to record purchases; all they could do is read what’s in your catalog. Write availability would eventually be restored, but there is a huge risk that users will churn or find another vendor because of this bad experience.

Enter the multi-region database

Replicating data in multiple regions around the globe is a key way to solve a lot of potential headaches with globally distributed applications. Not only does it reduce latency for local end users, but if something goes wrong, you’ve got the redundancy required to increase availability, ensure business continuity and support disaster recovery plans.

Accomplishing this data replication requires an active/active database deployed across multiple regions or data centers. Most distributed databases are active/passive, where there is one primary database that serves both reads and writes, and the remaining database instances are read-only. In an active/active database, all the instances can accept read and write requests. There’s one database, but it’s distributed with data replicated across multiple regions, typically deployed in a cloud provider. Regions can cross continents or be within one country.

One of the more popular and proven active/active distributed databases is Apache Cassandra®. With Cassandra, enterprises can get solid reliability, single-digit millisecond latency and high throughput — and there’s no “primary/replica” relationship. Relational and some NoSQL databases require a laborious, manual process to fail over between primary and secondary regions when a failure strikes. Cassandra, on the other hand, is an “AP” (available and partition tolerant) database, so if a network connection is interrupted or a data center goes down, Cassandra “self-heals.”

Now that we have a clearer picture of what a multi-region database looks like, let’s examine the use cases that benefit from this kind of architecture.

Disaster recovery

Cloud provider regions or data centers go down all the time. It’s an inescapable fact often making headlines, and it can cause major business disruptions.

For retailers with large e-commerce operations, this could result in millions of dollars of lost revenue because users couldn’t put items from your catalog into their shopping cart. A database that straddles multiple regions and enables an instantaneous failover would be far preferable to most enterprises, over getting a “sorry for the inconvenience” and credits from your cloud provider when a region goes down. Replicating a database across regions enables business continuity in case of a disaster by enabling the app to instantly connect with a region that is running.

Data sovereignty

Many countries require that data resides within a certain geography. For example, a shipping vendor might need to keep customer profile data within a certain region, such as the EU, to satisfy General Data Protection Regulation (GDPR) requirements. Other data, such as shipment status or location data, can be replicated globally because that data is not regulated by local authorities. More countries are enforcing data governance rules that mandate writes to be local to a geography and require that all user data be held within the country. For example, China mandates that all customer data be written first to a local data center; read operations can span other geographies. With Cassandra, it’s possible to specify which data can stay in the country and which data can be replicated elsewhere. In this manner, the enterprise can, for example, keep all customer profile data in a particular region, but replicate shipment status data across all its regions so users in multiple geographies can track a package.

Latency reduction

When your app’s users reside in several geographies, having a point of presence in each region can be an important way to reduce latency. Say your data center is in Ireland, and you have data workloads and end users in China. Your data might pass through several routers to get to the database, which can introduce significant latency into the time between when a user makes a request and the time it takes for the response to be sent back.

To reduce latency and deliver the best user experience, you want to have that data as close to the end user as possible. If your users are global, this means replicating data in geographies where they reside.

Active/active application architecture support

Some distributed databases are active/passive, so when the primary region goes down, you lose the ability to write transactions until a new primary database is chosen. One of the benefits of an active/active architecture is that you can write to any node. When you combine multiple instances of an application and multiple Cassandra regions, your applications can continue to serve users when a region goes down. In addition, enterprises can scale the number of instances of the application and number of regions easily, without having to touch the application code.

How a serverless architecture makes multiregion affordable

Despite all the benefits of using a multi-region database, most enterprises haven’t adopted such an architecture. Why? The costs associated with operating across multiple regions is a huge barrier because of the fixed costs associated with maintaining hardware in each region, regardless of how much it’s being used. If most users are in one region, a database of equivalent size would have to be maintained in the other region. Then there’s the cost of scaling linearly with every region added. In other words, you’d have to scale to the expected peak that you might see in your largest region, but do so in all regions.

However, recent innovations in database technologies can address this challenge. A serverless database combined with a pay-as-you-go pricing model can shatter this barrier. The serverless component enables enterprises to eliminate the need to size infrastructure, as the database would be instantiated on demand and scaled to the size of the workload automatically. The pay-as-you-go component makes cost variable with usage and offers significant savings because for each region, enterprises would pay the number of writes for replicating data and the data transfer charges.

There are several serverless databases and even some with pay-as-you-go pricing models in the market. However, most are associated with a specific cloud provider. As a result, data portability is limited. DataStax Astra DB is an example of a multi-region, serverless database-as-a-service built on Cassandra that’s also multi-cloud. It can run on AWS, Azure, and Google. It enables the spinning up of a database with the ability to scale up and down automatically in multiple regions — all with a few clicks — to support disaster recovery, data sovereignty and latency requirements.

Serverless, multiregion databases are a key way to ensure that organizations can meet the expectations of app users with no patience for performance hiccups, no matter where in the world they are.

Learn about DataStax Astra DB, the serverless, multi-cloud, multi-region DBaaS built on Cassandra.

--

--

DataStax

DataStax provides the real-time vector data tools that generative AI apps need, with seamless integration with developers' stacks of choice.