Storage 101: High Availability, Part 1


Companies today rely to an unprecedented extent on online, frequently accessed, constantly changing databases and applications to run their businesses. Unplanned events that inhibit the availability of this data can seriously damage business operations. Additionally, any permanent data loss (from natural disaster or any other source) will likely have serious negative consequences for the continued viability of a business.

Additionally, user expectations surrounding the availability of data and applications are continuing to increase, challenging IT staff to deliver often very high levels of service at a time when IT budgets are being scrutinized very closely for return on investment.

This article will explore three solutions designed to keep applications available, both locally and worldwide. The first is local clustering. This is where most high availability infrastructure has been concentrated in recent years. The next concept will be data replication, or migrating data to a remote location for disaster tolerance. The third topic is global clustering, an extension of local area clustering that includes data replication. Together, these technologies help provide global availability for your critical applications.

Local Clustering
Clustering is a combination of two or more servers with appropriate middleware and interconnects to achieve high levels of application availability. By clustering applications with a solution such as VERITAS Cluster Server, you ensure that should a failure occur on one system, the application is migrated to a secondary system and users can continue to access their application and data.

Application Failover
The most basic concept in clustering is application failover. Application failover allows for the movement of an application and all of its dependent resources from one system to another in the event of a component, system, or application failure.

By allowing for an automated failover, the application (and therefore the user experience) is protected from potential downtime.

In addition to an outage, scheduled maintenance is an important consideration for maintaining application availability. Maintenance can account for almost one third of all reported downtime. Since users expect access to the system 24 hours a day, there is no time for planned maintenance.

Application clustering allows for proactive application reconfiguration to support planned downtime. One example is adding additional memory and processors to a database server. The operator has several methods available to perform the upgrade. The simplest will be to upgrade a system where the database is not currently running, move the database to the upgraded machine, then upgrade the original database server.

Agents
An agent is the middleware that provides communication (translation) between the cluster engine and an application. Agents are application-aware mechanisms that start, stop, restart and monitor applications.

The actions required to bring a resource online or take it offline differ significantly for different types of resources. Bringing a disk group online, for example, requires importing the disk group. Bringing a database online would require starting the database manager process and issuing the appropriate startup command(s) to it. From the cluster engine's point of view, the same result is achieved--making the resource available. However, the actions performed are quite different.

When choosing a clustering solution, it is important to note which applications are supported. Not all applications are supported with every clustering solution.

Part 2
Part 3